Matthieu Labeau - ACL Anthology

Matthieu Labeau

2025

EmoDynamiX : Prédiction de stratégies de dialogue pour le support émotionnel via la modélisation de mélange d’émotions et de la dynamique du discours
Chenwei Wan | Matthieu Labeau | Chloé Clavel
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : traductions d'articles publiés

Concevoir des systèmes conversationnels dotés d’une intelligence émotionnelle pour apporter du réconfort et des conseils aux personnes en détresse constitue un domaine de recherche particulièrement prometteur. Récemment, grâce aux avancées des grands modèles de langue (LLMs), les agents conversationnels entraînés de bout en bout sans étapes explicites de prédiction de stratégie de dialogue sont devenus plus courants. Cependant, la planification implicite de stratégie manque de transparence, et des études récentes montrent que la préférence inhérente des LLMs pour certaines stratégies socioémotionnelles nuit à la qualité du soutien émotionnel fourni. Pour relever ce défi, nous proposons de dissocier la prédiction de stratégies de la génération du langage et introduisons un nouveau cadre de prédiction de stratégie conversationnelle, EmoDynamiX, qui modélise la dynamique du discours entre les émotions fines du côté de l’utilisateur et les stratégies du système au moyen d’un graphe hétérogène, afin d’améliorer à la fois les performances et la transparence. Les résultats expérimentaux sur deux jeux de données de conversations pour le support émotionnel (ESC) montrent qu’EmoDynamiX surpasse de manière significative les méthodes antérieures à l’état de l’art (avec une meilleure maîtrise et un biais de préférence plus faible). Notre approche offre également une meilleure transparence en permettant de retracer le processus de prise de décision.

EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics
Chenwei Wan | Matthieu Labeau | Chloé Clavel
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Designing emotionally intelligent conversational systems to provide comfort and advice to people experiencing distress is a compelling area of research. Recently, with advancements in large language models (LLMs), end-to-end dialogue agents without explicit strategy prediction steps have become prevalent. However, implicit strategy planning lacks transparency, and recent studies show that LLMs’ inherent preference bias towards certain socio-emotional strategies hinders the delivery of high-quality emotional support. To address this challenge, we propose decoupling strategy prediction from language generation, and introduce a novel dialogue strategy prediction framework, EmoDynamiX, which models the discourse dynamics between user fine-grained emotions and system strategies using a heterogeneous graph for better performance and transparency. Experimental results on two ESC datasets show EmoDynamiX outperforms previous state-of-the-art methods with a significant margin (better proficiency and lower preference bias). Our approach also exhibits better transparency by allowing backtracing of decision making.

Décoder le pouvoir de persuasion dans les concours d’éloquence : une étude sur la capacité des modèles de langues à évaluer la prise de parole en public
Alisa Barkar | Mathieu Chollet | Matthieu Labeau | Beatrice Biancardi | Chloé Clavel
Actes de l'atelier Évaluation des modèles génératifs (LLM) et challenge 2025 (EvalLLM)

L’importance des compétences en prise de parole en public (PPP) stimule le développement de systèmes d’évaluation automatisée, mais l’intégration des grandes modèles de langue (LLMs) reste peu explorée. Nous proposons un cadre où les LLMs évaluent des critères issus de la littérature et de retours de formateurs. Nous testons trois approches : des prédictions LLM directes à zéro coup (RMSE 0, 8) par rapport à des prédictions de persuasion basées sur des caractéristiques lexicales fabriquées à la main (RMSE 0, 51) ou basées sur des critères évalués par LLM 0, 6 insérés en entrée dans ElasticNet. L’analyse des liens entre critères et caractéristiques lexicales montre que seul le critère de niveau de langue évalué par LLM est prévisible (score F1 de 0, 56) soulignant les limites actuelles des LLMs pour l’analyse de la PPP. Code source et données disponibles sur GitHub.

Toward the Automatic Detection of Word Meaning Negotiation Indicators in Conversation
Aina Garí Soler | Matthieu Labeau | Chloé Clavel
Findings of the Association for Computational Linguistics: EMNLP 2025

Word Meaning Negotiations (WMN) are sequences in conversation where speakers collectively discuss and shape word meaning. These exchanges can provide insight into conversational dynamics and word-related misunderstandings, but they are hard to find in corpora. In order to facilitate data collection and speed up the WMN annotation process, we introduce the task of detecting WMN indicators – utterances where a speaker signals the need to clarify or challenge word meaning. We train a wide range of models and reveal the difficulty of the task. Our models have better precision than previous regular-expression based approaches and show some generalization abilities, but have moderate recall. However, this constitutes a promising first step toward an iterative process for obtaining more data.

Potentially Problematic Word Usages and How to Detect Them: A Survey
Aina Garí Soler | Matthieu Labeau | Chloé Clavel
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

We introduce and explore the concept of potentially problematic word usages (PPWUs): word occurrences that are likely to cause communication breakdowns of a semantic nature. While much research has been devoted to lexical complexity, ambiguity, vagueness and related issues, no work has attempted to fully capture the intricate nature of PPWUs. We review linguistic factors, datasets and metrics that can be helpful for PPWU detection. We also discuss challenges to their study, such as their complexity and subjectivity, and highlight the need for future work on this phenomenon.

2024

Using Locally Learnt Word Representations for better Textual Anomaly Detection
Alicia Breidenstein | Matthieu Labeau
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP

The literature on general purpose textual Anomaly Detection is quite sparse, as most textual anomaly detection methods are implemented as out of domain detection in the context of pre-established classification tasks. Notably, in a field where pre-trained representations and models are of common use, the impact of the pre-training data on a task that lacks supervision has not been studied. In this paper, we use the simple setting of k-classes out anomaly detection and search for the best pairing of representation and classifier. We show that well-chosen embeddings allow a simple anomaly detection baseline such as OC-SVM to achieve similar results and even outperform deep state-of-the-art models.

The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler | Matthieu Labeau | Chloé Clavel
Transactions of the Association for Computational Linguistics, Volume 12

When deriving contextualized word representations from language models, a decision needs to be made on how to obtain one for out-of-vocabulary (OOV) words that are segmented into subwords. What is the best way to represent these words with a single vector, and are these representations of worse quality than those of in-vocabulary words? We carry out an intrinsic evaluation of embeddings from different models on semantic similarity tasks involving OOV words. Our analysis reveals, among other interesting findings, that the quality of representations of words that are split is often, but not always, worse than that of the embeddings of known words. Their similarity values, however, must be interpreted with caution.

Revisiting Hierarchical Text Classification: Inference and Metrics
Roman Plaud | Matthieu Labeau | Antoine Saillenfest | Thomas Bonald
Proceedings of the 28th Conference on Computational Natural Language Learning

Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC

2023

Measuring Lexico-Semantic Alignment in Debates with Contextualized Word Representations
Aina Garí Soler | Matthieu Labeau | Chloé Clavel
Proceedings of the First Workshop on Social Influence in Conversations (SICon 2023)

Dialog participants sometimes align their linguistic styles, e.g., they use the same words and syntactic constructions as their interlocutors. We propose to investigate the notion of lexico-semantic alignment: to what extent do speakers convey the same meaning when they use the same words? We design measures of lexico-semantic alignment relying on contextualized word representations. We show that they reflect interesting semantic differences between the two sides of a debate and that they can assist in the task of debate’s winner prediction.

Participation de l’équipe TTGV à DEFT 2023~: Réponse automatique à des QCM issus d’examens en pharmacie
Andréa Blivet | Solène Degrutère | Barbara Gendron | Aurélien Renault | Cyrille Siouffi | Vanessa Gaudray Bouju | Christophe Cerisara | Hélène Flamein | Gaël Guibon | Matthieu Labeau | Tom Rousseau
Actes de CORIA-TALN 2023. Actes du Défi Fouille de Textes@TALN2023

Cet article présente l’approche de l’équipe TTGV dans le cadre de sa participation aux deux tâches proposées lors du DEFT 2023 : l’identification du nombre de réponses supposément justes à un QCM et la prédiction de l’ensemble de réponses correctes parmi les cinq proposées pour une question donnée. Cet article présente les différentes méthodologies mises en oeuvre, explorant ainsi un large éventail d’approches et de techniques pour aborder dans un premier temps la distinction entre les questions appelant une seule ou plusieurs réponses avant de s’interroger sur l’identification des réponses correctes. Nous détaillerons les différentes méthodes utilisées, en mettant en exergue leurs avantages et leurs limites respectives. Ensuite, nous présenterons les résultats obtenus pour chaque approche. Enfin, nous discuterons des limitations intrinsèques aux tâches elles-mêmes ainsi qu’aux approches envisagées dans cette contribution.

Un mot, deux facettes : traces des opinions dans les représentations contextualisées des mots
Aina Garí Soler | Matthieu Labeau | Chloe Clavel
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 4 : articles déjà soumis ou acceptés en conférence internationale

La façon dont nous utilisons les mots est influencée par notre opinion. Nous cherchons à savoir si cela se reflète dans les plongements de mots contextualisés. Par exemple, la représentation d’ « animal » est-elle différente pour les gens qui voudraient abolir les zoos et ceux qui ne le voudraient pas ? Nous explorons cette question du point de vue du changement sémantique des mots. Nos expériences avec des représentations dérivées d’ensembles de données annotés avec les points de vue révèlent des différences minimes, mais significatives, entre postures opposées.

2022

Participation de l’équipe TGV à DEFT 2022 : Prédiction automatique de notes d’étudiants à des questionnaires en fonction du type de question (Team TGV at DEFT 2022 : automatic prediction of students’ grades according to the different question types)
Vanessa Gaudray Bouju | Margot Guettier | Gwennola Lerus | Gaël Guibon | Matthieu Labeau | Luce Lefeuvre
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Atelier DÉfi Fouille de Textes (DEFT)

Cet article présente l’approche de l’équipe TGV lors de sa participation à la tâche de base de DEFT 2022, dont l’objectif était de prédire automatiquement les notes obtenues par des étudiants sur la base de leurs réponses à des questionnaires. Notre stratégie s’est focalisée sur la mise au point d’une méthode de classification des questions en fonction du type de réponse qu’elles attendent, de manière à pouvoir mener une approche différenciée pour chaque type. Nos trois runs ont consisté en une approche non différenciée, servant de référence, et deux approches différenciées, la première se basant sur la constitution d’un jeu de caractéristiques et la seconde sur le calcul de TF-IDF et de la fonction de hashage. Notre objectif premier était ainsi de vérifier si des approches dédiées à chaque type de questions sont préférables à une approche globale.

Polysemy in Spoken Conversations and Written Texts
Aina Garí Soler | Matthieu Labeau | Chloé Clavel
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Our discourses are full of potential lexical ambiguities, due in part to the pervasive use of words having multiple senses. Sometimes, one word may even be used in more than one sense throughout a text. But, to what extent is this true for different kinds of texts? Does the use of polysemous words change when a discourse involves two people, or when speakers have time to plan what to say? We investigate these questions by comparing the polysemy level of texts of different nature, with a focus on spontaneous spoken dialogs; unlike previous work which examines solely scripted, written, monolog-like data. We compare multiple metrics that presuppose different conceptualizations of text polysemy, i.e., they consider the observed or the potential number of senses of words, or their sense distribution in a discourse. We show that the polysemy level of texts varies greatly depending on the kind of text considered, with dialog and spoken discourses having generally a higher polysemy level than written monologs. Additionally, our results emphasize the need for relaxing the popular “one sense per discourse” hypothesis.

One Word, Two Sides: Traces of Stance in Contextualized Word Representations
Aina Garí Soler | Matthieu Labeau | Chloé Clavel
Proceedings of the 29th International Conference on Computational Linguistics

The way we use words is influenced by our opinion. We investigate whether this is reflected in contextualized word embeddings. For example, is the representation of “animal” different between people who would abolish zoos and those who would not? We explore this question from a Lexical Semantic Change standpoint. Our experiments with BERT embeddings derived from datasets with stance annotations reveal small but significant differences in word representations between opposing stances.

EZCAT: an Easy Conversation Annotation Tool
Gaël Guibon | Luce Lefeuvre | Matthieu Labeau | Chloé Clavel
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Users generate content constantly, leading to new data requiring annotation. Among this data, textual conversations are created every day and come with some specificities: they are mostly private through instant messaging applications, requiring the conversational context to be labeled. These specificities led to several annotation tools dedicated to conversation, and mostly dedicated to dialogue tasks, requiring complex annotation schemata, not always customizable and not taking into account conversation-level labels. In this paper, we present EZCAT, an easy-to-use interface to annotate conversations in a two-level configurable schema, leveraging message-level labels and conversation-level labels. Our interface is characterized by the voluntary absence of a server and accounts management, enhancing its availability to anyone, and the control over data, which is crucial to confidential conversations. We also present our first usage of EZCAT along with our annotation schema we used to annotate confidential customer service conversations. EZCAT is freely available at https://gguibon.github.io/ezcat.

2021

Méta-apprentissage : classification de messages en catégories émotionnelles inconnues en entraînement (Meta-learning : Classifying Messages into Unseen Emotional Categories)
Gaël Guibon | Matthieu Labeau | Hélène Flamein | Luce Lefeuvre | Chloé Clavel
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Dans cet article nous reproduisons un scénario d’apprentissage selon lequel les données cibles ne sont pas accessibles et seules des données connexes le sont. Nous utilisons une approche par méta-apprentissage afin de déterminer si les méta-informations apprises à partir de messages issus de médias sociaux, finement annotés en émotions, peuvent produire de bonnes performances une fois utilisées sur des messages issus de conversations, étiquetés en émotions avec une granularité différente. Nous mettons à profit l’apprentissage sur quelques exemples (few-shot learning) pour la mise en place de ce scénario. Cette approche se montre efficace pour capturer les méta-informations d’un jeu d’étiquettes émotionnelles pour prédire des étiquettes jusqu’alors inconnues au modèle. Bien que le fait de varier le type de données engendre une baisse de performance, notre approche par méta-apprentissage atteint des résultats décents comparés au référentiel d’apprentissage supervisé.

Meta-learning for Classifying Previously Unseen Data Source into Previously Unseen Emotional Categories
Gaël Guibon | Matthieu Labeau | Hélène Flamein | Luce Lefeuvre | Chloé Clavel
Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

In this paper, we place ourselves in a classification scenario in which the target classes and data type are not accessible during training. We use a meta-learning approach to determine whether or not meta-trained information from common social network data with fine-grained emotion labels can achieve competitive performance on messages labeled with different emotion categories. We leverage few-shot learning to match with the classification scenario and consider metric learning based meta-learning by setting up Prototypical Networks with a Transformer encoder, trained in an episodic fashion. This approach proves to be effective for capturing meta-information from a source emotional tag set to predict previously unseen emotional tags. Even though shifting the data type triggers an expected performance drop, our meta-learning approach achieves decent results when compared to the fully supervised one.

Improving Multimodal fusion via Mutual Dependency Maximisation
Pierre Colombo | Emile Chapuis | Matthieu Labeau | Chloé Clavel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Multimodal sentiment analysis is a trending area of research, and multimodal fusion is one of its most active topic. Acknowledging humans communicate through a variety of channels (i.e visual, acoustic, linguistic), multimodal systems aim at integrating different unimodal representations into a synthetic one. So far, a consequent effort has been made on developing complex architectures allowing the fusion of these modalities. However, such systems are mainly trained by minimising simple losses such as L₁ or cross-entropy. In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities. We demonstrate that our new penalties lead to a consistent improvement (up to 4.3 on accuracy) across a large variety of state-of-the-art models on two well-known sentiment analysis datasets: CMU-MOSI and CMU-MOSEI. Our method not only achieves a new SOTA on both datasets but also produces representations that are more robust to modality drops. Finally, a by-product of our methods includes a statistical network which can be used to interpret the high dimensional representations learnt by the model.

Code-switched inspired losses for spoken dialog representations
Pierre Colombo | Emile Chapuis | Matthieu Labeau | Chloé Clavel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Spoken dialogue systems need to be able to handle both multiple languages and multilinguality inside a conversation (e.g in case of code-switching). In this work, we introduce new pretraining losses tailored to learn generic multilingual spoken dialogue representations. The goal of these losses is to expose the model to code-switched language. In order to scale up training, we automatically build a pretraining corpus composed of multilingual conversations in five different languages (French, Italian, English, German and Spanish) from OpenSubtitles, a huge multilingual corpus composed of 24.3G tokens. We test the generic representations on MIAM, a new benchmark composed of five dialogue act corpora on the same aforementioned languages as well as on two novel multilingual tasks (i.e multilingual mask utterance retrieval and multilingual inconsistency identification). Our experiments show that our new losses achieve a better performance in both monolingual and multilingual settings.

Few-Shot Emotion Recognition in Conversation with Sequential Prototypical Networks
Gaël Guibon | Matthieu Labeau | Hélène Flamein | Luce Lefeuvre | Chloé Clavel
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Several recent studies on dyadic human-human interactions have been done on conversations without specific business objectives. However, many companies might benefit from studies dedicated to more precise environments such as after sales services or customer satisfaction surveys. In this work, we place ourselves in the scope of a live chat customer service in which we want to detect emotions and their evolution in the conversation flow. This context leads to multiple challenges that range from exploiting restricted, small and mostly unlabeled datasets to finding and adapting methods for such context. We tackle these challenges by using Few-Shot Learning while making the hypothesis it can serve conversational emotion classification for different languages and sparse labels. We contribute by proposing a variation of Prototypical Networks for sequence labeling in conversation that we name ProtoSeq. We test this method on two datasets with different languages: daily conversations in English and customer service chat conversations in French. When applied to emotion classification in conversations, our method proved to be competitive even when compared to other ones.

2020

The importance of fillers for text representations of speech transcripts
Tanvi Dinkar | Pierre Colombo | Matthieu Labeau | Chloé Clavel
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While being an essential component of spoken language, fillers (e.g. “um” or “uh”) often remain overlooked in Spoken Language Understanding (SLU) tasks. We explore the possibility of representing them with deep contextualised embeddings, showing improvements on modelling spoken language and two downstream tasks — predicting a speaker’s stance and expressed confidence.

Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
Emile Chapuis | Pierre Colombo | Matteo Manica | Matthieu Labeau | Chloé Clavel
Findings of the Association for Computational Linguistics: EMNLP 2020

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key component of spoken dialog systems. In this work, we propose a new approach to learn generic representations adapted to spoken dialog, which we evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE benchmark (SILICONE). SILICONE is model-agnostic and contains 10 different datasets of various sizes. We obtain our representations with a hierarchical encoder based on transformer architectures, for which we extend two well-known pre-training objectives. Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over 2.3 billion of tokens. We demonstrate how hierarchical encoders achieve competitive results with consistently fewer parameters compared to state-of-the-art models and we show their importance for both pre-training and fine-tuning.

2019

Experimenting with Power Divergences for Language Modeling
Matthieu Labeau | Shay B. Cohen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Neural language models are usually trained using Maximum-Likelihood Estimation (MLE). The corresponding objective function for MLE is derived from the Kullback-Leibler (KL) divergence between the empirical probability distribution representing the data and the parametric probability distribution output by the model. However, the word frequency discrepancies in natural language make performance extremely uneven: while the perplexity is usually very low for frequent words, it is especially difficult to predict rare words. In this paper, we experiment with several families (alpha, beta and gamma) of power divergences, generalized from the KL divergence, for learning language models with an objective different than standard MLE. Intuitively, these divergences should affect the way the probability mass is spread during learning, notably by prioritizing performances on high or low-frequency words. In addition, we implement and experiment with various sampling-based objectives, where the computation of the output layer is only done on a small subset of the vocabulary. They are derived as power generalizations of a softmax approximated via Importance Sampling, and Noise Contrastive Estimation, for accelerated learning. Our experiments on the Penn Treebank and Wikitext-2 show that these power divergences can indeed be used to prioritize learning on the frequent or rare words, and lead to general performance improvements in the case of sampling-based learning.

2018

Learning with Noise-Contrastive Estimation: Easing training by learning to scale
Matthieu Labeau | Alexandre Allauzen
Proceedings of the 27th International Conference on Computational Linguistics

Noise-Contrastive Estimation (NCE) is a learning criterion that is regularly used to train neural language models in place of Maximum Likelihood Estimation, since it avoids the computational bottleneck caused by the output softmax. In this paper, we analyse and explain some of the weaknesses of this objective function, linked to the mechanism of self-normalization, by closely monitoring comparative experiments. We then explore several remedies and modifications to propose tractable and efficient NCE training strategies. In particular, we propose to make the scaling factor a trainable parameter of the model, and to use the noise distribution to initialize the output bias. These solutions, yet simple, yield stable and competitive performances in either small and large scale language modelling tasks.

Algorithmes à base d’échantillonage pour l’entraînement de modèles de langue neuronaux (Here the title in English)
Matthieu Labeau | Alexandre Allauzen
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

L’estimation contrastive bruitée (NCE) et l’échantillonage par importance (IS) sont des procédures d’entraînement basées sur l’échantillonage, que l’on utilise habituellement à la place de l’estimation du maximum de vraisemblance (MLE) pour éviter le calcul du softmax lorsque l’on entraîne des modèles de langue neuronaux. Dans cet article, nous cherchons à résumer le fonctionnement de ces algorithmes, et leur utilisation dans la littérature du TAL. Nous les comparons expérimentalement, et présentons des manières de faciliter l’entraînement du NCE.

2017

Adaptation au domaine pour l’analyse morpho-syntaxique (Domain Adaptation for PoS tagging)
Éléonor Bartenlian | Margot Lacour | Matthieu Labeau | Alexandre Allauzen | Guillaume Wisniewski | François Yvon
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Ce travail cherche à comprendre pourquoi les performances d’un analyseur morpho-syntaxiques chutent fortement lorsque celui-ci est utilisé sur des données hors domaine. Nous montrons à l’aide d’une expérience jouet que ce comportement peut être dû à un phénomène de masquage des caractéristiques lexicalisées par les caractéristiques non lexicalisées. Nous proposons plusieurs modèles essayant de réduire cet effet.

LIMSI@WMT’17
Franck Burlot | Pooyan Safari | Matthieu Labeau | Alexandre Allauzen | François Yvon
Proceedings of the Second Conference on Machine Translation

Character and Subword-Based Word Representation for Neural Language Modeling Prediction
Matthieu Labeau | Alexandre Allauzen
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Most of neural language models use different kinds of embeddings for word prediction. While word embeddings can be associated to each word in the vocabulary or derived from characters as well as factored morphological decomposition, these word representations are mainly used to parametrize the input, i.e. the context of prediction. This work investigates the effect of using subword units (character and factored morphological decomposition) to build output representations for neural language modeling. We present a case study on Czech, a morphologically-rich language, experimenting with different input and output representations. When working with the full training vocabulary, despite unstable training, our experiments show that augmenting the output word representations with character-based embeddings can significantly improve the performance of the model. Moreover, reducing the size of the output look-up table, to let the character-based embeddings represent rare words, brings further improvement.

Représentations continues dérivées des caractères pour un modèle de langue neuronal à vocabulaire ouvert (Opening the vocabulary of neural language models with character-level word representations)
Matthieu Labeau | Alexandre Allauzen
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

Cet article propose une architecture neuronale pour un modèle de langue à vocabulaire ouvert. Les représentations continues des mots sont calculées à la volée à partir des caractères les composant, gràce à une couche convolutionnelle suivie d’une couche de regroupement (pooling). Cela permet au modèle de représenter n’importe quel mot, qu’il fasse partie du contexte ou soit évalué pour la prédiction. La fonction objectif est dérivée de l’estimation contrastive bruitée (Noise Contrastive Estimation, ou NCE), calculable dans notre cas sans vocabulaire. Nous évaluons la capacité de notre modèle à construire des représentations continues de mots inconnus sur la tâche de traduction automatique IWSLT-2016, de l’Anglais vers le Tchèque, en ré-évaluant les N meilleures hypothèses (N-best reranking). Les résultats expérimentaux permettent des gains jusqu’à 0,7 point BLEU. Ils montrent aussi la difficulté d’utiliser des représentations dérivées des caractères pour la prédiction.

An experimental analysis of Noise-Contrastive Estimation: the noise distribution matters
Matthieu Labeau | Alexandre Allauzen
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Noise Contrastive Estimation (NCE) is a learning procedure that is regularly used to train neural language models, since it avoids the computational bottleneck caused by the output softmax. In this paper, we attempt to explain some of the weaknesses of this objective function, and to draw directions for further developments. Experiments on a small task show the issues raised by an unigram noise distribution, and that a context dependent noise distribution, such as the bigram distribution, can solve these issues and provide stable and data-efficient learning.

Proceedings of ACL 2017, Student Research Workshop
Allyson Ettinger | Spandana Gella | Matthieu Labeau | Cecilia Ovesdotter Alm | Marine Carpuat | Mark Dredze
Proceedings of ACL 2017, Student Research Workshop

2016

LIMSI@IWSLT’16: MT Track
Franck Burlot | Matthieu Labeau | Elena Knyazeva | Thomas Lavergne | Alexandre Allauzen | François Yvon
Proceedings of the 13th International Conference on Spoken Language Translation

This paper describes LIMSI’s submission to the MT track of IWSLT 2016. We report results for translation from English into Czech. Our submission is an attempt to address the difficulties of translating into a morphologically rich language by paying special attention to the morphology generation on target side. To this end, we propose two ways of improving the morphological fluency of the output: 1. by performing translation and inflection of the target language in two separate steps, and 2. by using a neural language model with characted-based word representation. We finally present the combination of both methods used for our primary system submission.

2015

Non-lexical neural architecture for fine-grained POS Tagging
Matthieu Labeau | Kevin Löser | Alexandre Allauzen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

LIMSI@WMT’15 : Translation Task
Benjamin Marie | Alexandre Allauzen | Franck Burlot | Quoc-Khanh Do | Julia Ive | Elena Knyazeva | Matthieu Labeau | Thomas Lavergne | Kevin Löser | Nicolas Pécheux | François Yvon
Proceedings of the Tenth Workshop on Statistical Machine Translation

Venues