Gwénolé Lecorvé

2026

DivMerge: A divergence-based model merging method for multi-tasking
Brahim Touayouch | Loïc Fosse | Géraldine Damnati | Gwénolé Lecorvé
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Merging fine-tuned models is a promising alternative to costly multi-task training, but task interference remains a challenge, especially as the number of tasks grows. We present DivMerge, a reference-free method that merges models trained on different tasks by minimizing Jensen-Shannon divergence between their outputs and those of the merged model, automatically balancing task importance. While the method exhibits strong theoretical properties, experiments on classification and generative tasks with autoregressive models show that DivMerge consistently outperforms prior work, and remains robust when scaling to more tasks.

2025

pdf bib abs

Estimation de l’inclusion entre tâches par projection spectrale de vecteurs de tâches
Loïc Fosse | Benoît Favre | Frédéric Béchet | Géraldine Damnati | Gwénolé Lecorvé
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

L’affinage des modèles a permis la plupart des avancées significatives récentes dans les tâches de TALN. Des études ont exploré les raisons de ces succès en étudiant le mécanisme d’attention, la manière dont les connaissances linguistiques et factuelles sont encodées, etc... . Il est cependant difficile d’interpréter les changements causés par l’affinage dans les poids des modèles. Pour mieux comprendre cela, nous proposons une méthode fondée théoriquement pour projeter et comparer les changements de poids (i.e. vecteurs de tâches) dans un espace à faible dimension. Cette approche permet de mieux comprendre les connaissances encodées dans un vecteur de tâches, relativement à un autre vecteur de tâche. Nous validons notre méthode en montrant qu’un modèle affiné sur une tâche de résumé encode des informations sur la reconnaissance d’entités nommées.

pdf bib abs

The growing number of generative AI-based dialogue systems has made their evaluation a crucial challenge. This paper presents our contribution to this important problem through the Dialogue System Technology Challenge (DSTC-12, Track 1), where we developed models to predict dialogue-level, dimension-specific scores. Given the constraint of using relatively small models (i.e. fewer than 13 billion parameters) our work follows two main strategies: employing Language Models (LMs) as evaluators through prompting, and training encoder-based classification and regression models.Our results show that while LM prompting achieves only modest correlations with human judgments, it still ranks second on the test set, outperformed only by the baseline.The regression and classification models, with significantly fewer parameters, demonstrate high correlation for some dimensions on the validation set. Although their performance decreases on the test set, it is important to note that the test set contains annotations with significantly different score ranges for some of the dimensions with respect to the train and validation sets.

pdf bib abs

Factual Knowledge Assessment of Language Models Using Distractors
Hichem Ammar Khodja | Abderrahmane Ait gueni ssaid | Frederic Bechet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the 31st International Conference on Computational Linguistics

Language models encode extensive factual knowledge within their parameters. The accurate assessment of this knowledge is crucial for understanding and improving these models. In the literature, factual knowledge assessment often relies on cloze sentences, which can lead to erroneous conclusions due to the complexity of natural language (out-of-subject continuations, the existence of many correct answers and the several ways of expressing them). In this paper, we introduce a new interpretable knowledge assessment method that mitigates these issues by leveraging distractors—incorrect but plausible alternatives to the correct answer. We propose several strategies for retrieving distractors and determine the most effective one through experimentation. Our method is evaluated against existing approaches, demonstrating solid alignment with human judgment and stronger robustness to verbalization artifacts. The code and data to reproduce our experiments are available on GitHub.

pdf bib abs

Connaissances factuelles dans les modèles de langue : robustesse et anomalies face à des variations simples du contexte temporel
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Ce papier explore la robustesse des modèles de langue (ML) face aux variations du contexte temporel dans les connaissances factuelles. Il examine si les ML peuvent associer correctement un contexte temporel à un fait passé valide sur une période de temps délimitée, en leur demandant de différencier les contextes corrects des contextes incorrects. La capacité de distinction des ML est analysée sur deux dimensions : la distance du contexte incorrect par rapport à la période de validité et la granularité du contexte. Pour cela, un jeu de données, TimeStress, est introduit, permettant de tester 18 ML variés. Les résultats révèlent que le meilleur ML n’atteint une distinction parfaite que pour 11% des faits étudiés, avec des erreurs critiques qu’un humain ne ferait pas. Ces travaux soulignent les limites des ML actuels en matière de représentation temporelle.

pdf bib

Automatic characterization of French language registers: illustration on tweets
Jade Mekki | Nicolas Béchet | Delphine Battistelli | Gwénolé Lecorvé
Traitement Automatique des Langues, Volume 66, Numéro 1

pdf bib abs

Tasks are central in machine learning, as they are the most natural objects to assess the capabilities of current models. The trend is to build general models able to address any task. Even though transfer learning and multitask learning try to leverage the underlying task space, no well-founded tools are available to study its structure. This study proposes a theoretically grounded setup to define the notion of task and to compute the inclusion between two tasks from a statistical deficiency point of view. We propose a tractable proxy as information sufficiency to estimate the degree of inclusion between tasks, show its soundness on synthetic data, and use it to reconstruct empirically the classic NLP pipeline.

pdf bib abs

Factual Knowledge in Language Models: Robustness and Anomalies under Simple Temporal Context Variations
Hichem Ammar Khodja | Frederic Bechet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)

This paper explores the robustness of language models (LMs) to variations in the temporal context within factual knowledge. It examines whether LMs can correctly associate a temporal context with a past fact valid over a defined period, by asking them to differentiate correct from incorrect contexts. The LMs’ ability to distinguish is analyzed along two dimensions: the distance of the incorrect context from the validity period and the granularity of the context. To this end, a dataset called TimeStress is introduced, enabling the evaluation of 18 diverse LMs. Results reveal that the best LM achieves a perfect distinction for only 11% of the studied facts, with errors, certainly rare, but critical that humans would not make. This work highlights the limitations of current LMs in temporal representation.

2024

pdf bib abs

Emotion Identification for French in Written Texts: Considering Modes of Emotion Expression as a Step Towards Text Complexity Analysis
Aline Étienne | Delphine Battistelli | Gwénolé Lecorvé
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

The objective of this paper is to predict (A) whether a sentence in a written text expresses an emotion, (B) the mode(s) in which the emotion is expressed, (C) whether it is basic or complex, and (D) its emotional category.One of our major contributions, in addition to a dataset and a model, is to integrate the fact that an emotion can be expressed in different modes: from a direct mode, essentially lexicalized, to a more indirect mode, where emotions will only be suggested, a mode that NLP approaches generally don’t take into account. The scope is on written texts, i.e. it does not focus on conversational or multi-modal data. In this context, modes of expression are seen as a factor towards the automatic analysis of complexity in texts.Experiments on French texts show acceptable results compared to the human annotators’ agreement to predict the mode and category, and outperforming results compared to using a large language model with in-context learning (i.e. no fine-tuning) on all tasks.Dataset and model can be downloaded on HuggingFace: https://huggingface.co/TextToKids .

pdf bib abs

WikiFactDiff: A Large, Realistic, and Temporally Adaptable Dataset for Atomic Factual Knowledge Update in Causal Language Models
Hichem Ammar Khodja | Frédéric Béchet | Quentin Brabant | Alexis Nasr | Gwénolé Lecorvé
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The factuality of large language model (LLMs) tends to decay over time since events posterior to their training are “unknown” to them. One way to keep models up-to-date could be factual update: the task of inserting, replacing, or removing certain simple (atomic) facts within the model. To study this task, we present WikiFactDiff, a dataset that describes the evolution of factual knowledge between two dates as a collection of simple facts divided into three categories: new, obsolete, and static. We describe several update scenarios arising from various combinations of these three types of basic update. The facts are represented by subject-relation-object triples; indeed, WikiFactDiff was constructed by comparing the state of the Wikidata knowledge base at 4 January 2021 and 27 February 2023. Those fact are accompanied by verbalization templates and cloze tests that enable running update algorithms and their evaluation metrics. Contrary to other datasets, such as zsRE and CounterFact, WikiFactDiff constitutes a realistic update setting that involves various update scenarios, including replacements, archival, and new entity insertions. We also present an evaluation of existing update algorithms on WikiFactDiff.

pdf bib abs

KGConv, a Conversational Corpus Grounded in Wikidata
Quentin Brabant | Lina M. Rojas Barahona | Gwénolé Lecorvé | Claire Gardent
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present KGConv, a large corpus of 71k English conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. KGConv can further be used for other generation and analysis tasks such as single-turn question generation from Wikidata triples, question rewriting, question answering from conversation or from knowledge graphs and quiz generation.

pdf bib abs

Repérage et caractérisation automatique des émotions dans des textes : traiter aussi leurs modes d’expression indirects
Aline Etienne | Delphine Battistelli | Gwénolé Lecorvé
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Cet article présente un modèle capable de prédire (A) si une phrase contient l’expression d’une émotion, (B) selon quel(s) mode(s) cette émotion est exprimée, (C) si elle est basique ou complexe, et (D) quelle est sa catégorie exacte. Notre principale contribution est d’intégrer le fait qu’une émotion puisse s’exprimer selon différents modes : depuis un mode direct, essentiellement lexicalisé, jusqu’à un mode plus indirect, où des émotions vont être seulement suggérées, mode dont les approches en TAL ne tiennent généralement pas compte. Nos expériences sur des textes en français pour les enfants mènent à des résultats tout à fait acceptables en comparaison de ce sur quoi des annotateurs humains experts en psycholinguistique s’accordent et à des résultats meilleurs que ceux produits par GPT-3.5 via du prompting. Ceci offre une perspective intéressante de prise en compte des émotions comme facteur d’analyse automatique de la complexité dans les textes, cadre plus général de nos travaux.

2023

pdf bib abs

Darbarer @ AutoMin2023: Transcription simplification for concise minute generation from multi-party conversations
Ismaël Rousseau | Loïc Fosse | Youness Dkhissi | Geraldine Damnati | Gwénolé Lecorvé
Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges

This document reports the approach of our team Darbarer for the main task (Task A) of the AutoMin 2023 challenge. Our system is composed of four main modules. The first module relies on a text simplification model aiming at standardizing the utterances of the conversation and compressing the input in order to focus on informative content. The second module handles summarization by employing a straightforward segmentation strategy and a fine-tuned BART-based generative model. Then a titling module has been trained in order to propose a short description of each summarized block. Lastly, we apply a post-processing step aimed at enhancing readability through specific formatting rules. Our contributions lie in the first, third and last steps. Our system generates precise and concise minutes. We provide a detailed description of our modules, discuss the difficulty of evaluating their impact and propose an analysis of observed errors in our generated minutes.

2022

pdf bib abs

A (Psycho-)Linguistically Motivated Scheme for Annotating and Exploring Emotions in a Genre-Diverse Corpus
Aline Etienne | Delphine Battistelli | Gwénolé Lecorvé
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents a scheme for emotion annotation and its manual application on a genre-diverse corpus of texts written in French. The methodology introduced here emphasizes the necessity of clarifying the main concepts implied by the analysis of emotions as they are expressed in texts, before conducting a manual annotation campaign. After explaining whatentails a deeply linguistic perspective on emotion expression modeling, we present a few NLP works that share some common points with this perspective and meticulously compare our approach with them. We then highlight some interesting quantitative results observed on our annotated corpus. The most notable interactions are on the one hand between emotion expression modes and genres of texts, and on the other hand between emotion expression modes and emotional categories. These observation corroborate and clarify some of the results already mentioned in other NLP works on emotion annotation.

pdf bib abs

CoQAR: Question Rewriting on CoQA
Quentin Brabant | Gwénolé Lecorvé | Lina M. Rojas Barahona
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Questions asked by humans during a conversation often contain contextual dependencies, i.e., explicit or implicit references to previous dialogue turns. These dependencies take the form of coreferences (e.g., via pronoun use) or ellipses, and can make the understanding difficult for automated systems. One way to facilitate the understanding and subsequent treatments of a question is to rewrite it into an out-of-context form, i.e., a form that can be understood without the conversational context. We propose CoQAR, a corpus containing 4.5K conversations from the Conversational Question-Answering dataset CoQA, for a total of 53K follow-up question-answer pairs. Each original question was manually annotated with at least 2 at most 3 out-of-context rewritings. CoQA originally contains 8k conversations, which sum up to 127k question-answer pairs. CoQAR can be used in the supervised learning of three tasks: question paraphrasing, question rewriting and conversational question answering. In order to assess the quality of CoQAR’s rewritings, we conduct several experiments consisting in training and evaluating models for these three tasks. Our results support the idea that question rewriting can be used as a preprocessing step for (conversational and non-conversational) question answering models, thereby increasing their performances.

pdf bib

Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]
Gwénolé Lecorvé | John D. Kelleher
Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]

pdf bib

Introduction to the special issue on cross-modal and multimodal natural language processing
Gwénolé Lecorvé | John D. Kelleher
Traitement Automatique des Langues, Volume 63, Numéro 2 : Traitement automatique des langues intermodal et multimodal [Cross-modal and multimodal natural language processing]

pdf bib abs

SPARQL-to-Text Question Generation for Knowledge-Based Conversational Applications
Gwénolé Lecorvé | Morgan Veyret | Quentin Brabant | Lina M. Rojas Barahona
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper focuses on the generation of natural language questions based on SPARQL queries, with an emphasis on conversational use cases (follow-up question-answering). It studies what can be achieved so far based on current deep learning models (namely pretrained T5 and BART models). To do so, 4 knowledge-based QA corpora have been homogenized for the task and a new challenge set is introduced. A first series of experiments analyzes the impact of different training setups, while a second series seeks to understand what is still difficult for these models. The results from automatic metrics and human evaluation show that simple questions and frequent templates of SPARQL queries are usually well processed whereas complex questions and conversational dimensions (coreferences and ellipses) are still difficult to handle. The experimental material is publicly available on https://github.com/Orange-OpenSource/sparql-to-text .

pdf bib abs

Une chaîne de traitement pour prédire et appréhender la complexité des textes pour enfants d’un point de vue linguistique (A Processing Chain to Explain the Complexity of Texts for Children From a Linguistic and Psycho-linguistic Point of View)
Delphine Battistelli | Aline Etienne | Rashedur Rahman | Charles Teissèdre | Gwénolé Lecorvé
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Nos travaux abordent la question de la mesure de la complexité d’un texte vis-à-vis d’une cible de lecteurs, les enfants en âge de lire, au travers de la mise en place d’une chaîne de traitements. Cette chaîne vise à extraire des descripteurs linguistiques, principalement issus de travaux en psycholinguistique et de travaux sur la lisibilité, mobilisables pour appréhender la complexité d’un texte. En l’appliquant sur un corpus de textes de fiction, elle permet d’étudier des corrélations entre certains descripteurs linguistiques et les tranches d’âges associées aux textes par les éditeurs. L’analyse de ces corrélations tend à valider la pertinence de la catégorisation en âges par les éditeurs. Elle justifie ainsi la mobilisation d’un tel corpus pour entraîner à partir des âges éditeurs un modèle de prédiction de l’âge cible d’un texte.

2021

pdf bib abs

TREMoLo : un corpus multi-étiquettes de tweets en français pour la caractérisation des registres de langue (TREMoLo : a Multi-Label Corpus of French Tweets for Language Register Characterization)
Jade Mekki | Delphine Battistelli | Nicolas Béchet | Gwénolé Lecorvé
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Des registres tels que familier, courant et soutenu sont un phénomène immédiatement perceptible par tout locuteur d’une langue. Ils restent encore peu étudiés en traitement des langues (TAL), en particulier en dehors de l’anglais. Cet article présente un large corpus de tweets en français annotés en registres de langue. L’annotation intègre des marqueurs propres à ce type de textes (tels que les émoticônes ou les hashtags) et habituellement évincés dans les travaux en TAL. À partir d’une graine annotée manuellement en proportion d’appartenance aux registres, un classifieur de type CamemBERT est appris et appliqué sur un large ensemble de tweets. Le corpus annoté en résultant compte 228 505 tweets pour un total de 6 millions de mots. Des premières analyses statistiques sont menées et permettent de conclure à la qualité du corpus présenté. Le corpus ainsi que son guide d’annotation sont mis à la disposition de la communauté scientifique.

pdf bib abs

TREMoLo-Tweets: A Multi-Label Corpus of French Tweets for Language Register Characterization
Jade Mekki | Gwénolé Lecorvé | Delphine Battistelli | Nicolas Béchet
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The casual, neutral, and formal language registers are highly perceptible in discourse productions. However, they are still poorly studied in Natural Language Processing (NLP), especially outside English, and for new textual types like tweets. To stimulate research, this paper introduces a large corpus of 228,505 French tweets (6M words) annotated in language registers. Labels are provided by a multi-label CamemBERT classifier trained and checked on a manually annotated subset of the corpus, while the tweets are selected to avoid undesired biases. Based on the corpus, an initial analysis of linguistic traits from either human annotators or automatic extractions is provided to describe the corpus and pave the way for various NLP tasks. The corpus, annotation guide and classifier are available on http://tremolo.irisa.fr.

2020

pdf bib abs

Age Recommendation for Texts
Alexis Blandin | Gwénolé Lecorvé | Delphine Battistelli | Aline Étienne
Proceedings of the Twelfth Language Resources and Evaluation Conference

The understanding of a text by a reader or listener is conditioned by the adequacy of the text’s characteristics with the person’s capacities and knowledge. This adequacy is critical in the case of a child since her/his cognitive and linguistic skills are still under development. Hence, in this paper, we present and study an original natural language processing (NLP) task which consists in predicting the age from which a text can be understood by someone. To do so, this paper first exhibits features derived from the psycholinguistic domain, as well as some coming from related NLP tasks. Then, we propose a set of neural network models and compare them on a dataset of French texts dedicated to young or adult audiences. To circumvent the lack of data, we study the idea to predict ages at the sentence level. The experiments first show that the sentence-based age recommendations can be efficiently merged to predict text-based recommendations. Then, we also demonstrate that the age predictions returned by our best model are better than those provided by psycholinguists. Finally, the paper investigates the impact of the various features used in these results.

pdf bib abs

L’expression des émotions dans les textes pour enfants : constitution d’un corpus annoté (Expressing emotions in texts for children: constitution of an annotated corpus)
Aline Étienne | Delphine Battistelli | Gwénolé Lecorvé
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Cet article présente une typologie de divers modes d’expression linguistique des émotions, le schéma d’annotation sous Glozz qui implémente cette typologie et un corpus de textes journalistiques pour enfants annoté à l’aide de ce schéma. Ces travaux préliminaires s’insèrent dans le contexte d’une étude relative au développement des capacités langagières des enfants, en particulier de leur capacité à comprendre un texte selon des critères émotionnels.

pdf bib abs

Recommandation d’âge pour des textes (Age recommendation for texts)
Alexis Blandin | Gwénolé Lecorvé | Delphine Battistelli | Aline Étienne
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Cet article étudie une première tentative pour prédire une recommandation d’âge estimant à partir de quand un enfant pourrait comprendre un texte donné. À ce titre, nous présentons d’abord des descripteurs issus de divers domaines scientifiques, puis proposons différentes architectures de réseaux de neurones et les comparons sur un ensemble de données textuelles en français, dédiées à des publics jeune ou adulte. Pour contourner la faible quantité de données de ce type, nous étudions l’idée de prédire les âges au niveau de la phrase. Les expériences montrent que cette hypothèse, quoique forte, conduit d’ores et déjà à de bons résultats, meilleurs que ceux fournis par des experts psycholinguistes, y compris lorsque les phrases isolées sont remplacées par textes complets.

pdf bib abs

Style versus Content: A distinction without a (learnable) difference?
Somayeh Jafaritazehjani | Gwénolé Lecorvé | Damien Lolive | John Kelleher
Proceedings of the 28th International Conference on Computational Linguistics

Textual style transfer involves modifying the style of a text while preserving its content. This assumes that it is possible to separate style from content. This paper investigates whether this separation is possible. We use sentiment transfer as our case study for style transfer analysis. Our experimental methodology frames style transfer as a multi-objective problem, balancing style shift with content preservation and fluency. Due to the lack of parallel data for style transfer we employ a variety of adversarial encoder-decoder networks in our experiments. Also, we use of a probing methodology to analyse how these models encode style-related features in their latent spaces. The results of our experiments which are further confirmed by a human evaluation reveal the inherent trade-off between the multiple style transfer objectives which indicates that style cannot be usefully separated from content within these style-transfer systems.

pdf bib abs

Children have less linguistic skills than adults, which makes it more difficult for them to understand some texts, for instance when browsing the Internet. In this context, we present a novel method which predicts the minimal age from which a text can be understood. This method analyses each sentence of a text using a recurrent neural network, and then aggregates this information to provide the text-level prediction. Different approaches are proposed and compared to baseline models, at sentence and text levels. Experiments are carried out on a corpus of 1, 500 texts and 160K sentences. Our best model, based on LSTMs, outperforms state-of-the-art results and achieves mean absolute errors of 1.86 and 2.28, at sentence and text levels, respectively.

pdf bib abs

FlexEval, création de sites web légers pour des campagnes de tests perceptifs multimédias (FlexEval, creation of light websites for multimedia perceptual test campaigns)
Cédric Fayet | Alexis Blond | Grégoire Coulombel | Claude Simon | Damien Lolive | Gwénolé Lecorvé | Jonathan Chevelu | Sébastien Le Maguer
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Nous présentons FlexEval, un outil de conception et déploiement de tests perceptifs multimédias sous la forme d’un site web léger. S’appuyant sur des technologies standards et ouvertes du web, notamment le framework Flask, FlexEval offre une grande souplesse de conception, des gages de pérennité, ainsi que le support de communautés actives d’utilisateurs. L’application est disponible en open-source via le dépôt Git https://gitlab.inria.fr/expression/tools/flexeval.

2019

pdf bib abs

Évaluation objective de plongements pour la synthèse de parole guidée par réseaux de neurones (Objective evaluation of embeddings for speech synthesis guided by neural networks)
Antoine Perquin | Gwénolé Lecorvé | Damien Lolive | Laurent Amsaleg
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

L’évaluation de plongements issus de réseaux de neurones est un procédé complexe. La qualité des plongements est liée à la tâche spécifique pour laquelle ils ont été entraînés et l’évaluation de cette tâche peut être un procédé long et onéreux s’il y a besoin d’annotateurs humains. Il peut donc être préférable d’estimer leur qualité grâce à des mesures objectives rapides et reproductibles sur des tâches annexes. Cet article propose une méthode générique pour estimer la qualité d’un plongement. Appliquée à la synthèse de parole par sélection d’unités guidée par réseaux de neurones, cette méthode permet de comparer deux systèmes distincts.

2018

pdf bib abs

Identification de descripteurs pour la caractérisation de registres (Feature identification for register characterization)
Jade Mekki | Delphine Battistelli | Gwénolé Lecorvé | Nicolas Béchet
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

L’article présente une étude des descripteurs linguistiques pour la caractérisation d’un texte selon son registre de langue (familier, courant, soutenu). Cette étude a pour but de poser un premier jalon pour des tâches futures sur le sujet (classification, extraction de motifs discriminants). À partir d’un état de l’art mené sur la notion de registre dans la littérature linguistique et sociolinguistique, nous avons identifié une liste de 72 descripteurs pertinents. Dans cet article, nous présentons les 30 premiers que nous avons pu valider sur un corpus de textes français de registres distincts.

pdf bib abs

Construction conjointe d’un corpus et d’un classifieur pour les registres de langue en français (Joint building of a corpus and a classifier for language registers in French)
Gwénolé Lecorvé | Hugo Ayats | Fournier Benoît | Jade Mekki | Jonathan Chevelu | Delphine Battistelli | Nicolas Béchet
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Les registres de langue sont un trait stylistique marquant dans l’appréciation d’un texte ou d’un discours. Cependant, il sont encore peu étudiés en traitement automatique des langues. Dans cet article, nous présentons une approche semi-supervisée permettant la construction conjointe d’un corpus de textes étiquetés en registres et d’un classifieur associé. Cette approche s’appuie sur un ensemble initial et restreint de données expertes. Via une collecte automatique et massive de pages web, l’approche procède par itérations en alternant l’apprentissage d’un classifieur intermédiaire et l’annotation de nouveaux textes pour augmenter le corpus étiqueté. Nous appliquons cette approche aux registres familier, courant et soutenu. À l’issue du processus de construction, le corpus étiqueté regroupe 800 000 textes et le classifieur, un réseau de neurones, présente un taux de bonne classification de 87 %.

2017

pdf bib abs

Ajout automatique de disfluences pour la synthèse de la parole spontanée : formalisation et preuve de concept (Automatic disfluency insertion towards spontaneous TTS : formalization and proof of concept)
Raheel Qader | Gwénolé Lecorvé | Damien Lolive | Pascale Sébillot
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

Cet article présente un travail exploratoire sur l’ajout automatique de disfluences, c’est-à-dire de pauses, de répétitions et de révisions, dans les énoncés en entrée d’un système de synthèse de la parole. L’objectif est de conférer aux signaux ainsi synthétisés un caractère plus spontané et expressif. Pour cela, nous présentons une formalisation novatrice du processus de production de disfluences à travers un mécanisme de composition de ces disfluences. Cette formalisation se distingue notamment des approches visant la détection ou le nettoyage de disfluences dans des transcriptions, ou de celles en synthèse de la parole qui ne s’intéressent qu’au seul ajout de pauses. Nous présentons une première implémentation de notre processus fondée sur des champs aléatoires conditionnels et des modèles de langage, puis conduisons des évaluations objectives et perceptives. Celles-ci nous permettent de conclure à la fonctionnalité de notre proposition et d’en discuter les pistes principales d’amélioration.

2016

pdf bib abs

Phonétisation statistique adaptable d’énoncés pour le français (Adaptive statistical utterance phonetization for French ⇤ )
Gwénolé Lecorvé | Damien Lolive
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Les méthodes classiques de phonétisation d’énoncés concatènent les prononciations hors-contexte des mots. Ce type d’approches est trop faible pour certaines langues, comme le français, où les transitions entre les mots impliquent des modifications de prononciation. De plus, cela rend difficile la modélisation de stratégies de prononciation globales, par exemple pour modéliser un locuteur ou un accent particulier. Pour palier ces problèmes, ce papier présente une approche originale pour la phonétisation du français afin de générer des variantes de prononciation dans le cas d’énoncés. Par l’emploi de champs aléatoires conditionnels et de transducteurs finis pondérés, cette approche propose un cadre statistique particulièrement souple et adaptable. Cette approche est évaluée sur un corpus de mots isolés et sur un corpus d’énoncés prononcés.

pdf bib abs

Adaptation de la prononciation pour la synthèse de la parole spontanée en utilisant des informations linguistiques (Pronunciation adaptation for spontaneous speech synthesis using linguistic information)
Raheel Qader | Gwénolé Lecorvé | Damien Lolive | Pascale Sébillot
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP

Cet article présente une nouvelle méthode d’adaptation de la prononciation dont le but est de reproduire le style spontané. Il s’agit d’une tâche-clé en synthèse de la parole car elle permet d’apporter de l’expressivité aux signaux produits, ouvrant ainsi la voie à de nouvelles applications. La force de la méthode proposée est de ne s’appuyer que sur des informations linguistiques et de considérer un cadre probabiliste pour ce faire, précisément les champs aléatoires conditionnels. Dans cet article, nous étudions tout d’abord la pertinence d’un ensemble d’informations pour l’adaptation, puis nous combinons les informations les plus pertinentes lors d’expériences finales. Les évaluations de la méthode sur un corpus de parole conversationnelle en anglais montrent que les prononciations adaptées reflètent significativement mieux un style spontané que les prononciations canoniques.

2014

pdf bib abs

ROOTS: a toolkit for easy, fast and consistent processing of large sequential annotated data collections
Jonathan Chevelu | Gwénolé Lecorvé | Damien Lolive
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The development of new methods for given speech and natural language processing tasks usually consists in annotating large corpora of data before applying machine learning techniques to train models or to extract information. Beyond scientific aspects, creating and managing such annotated data sets is a recurrent problem. While using human annotators is obviously expensive in time and money, relying on automatic annotation processes is not a simple solution neither. Typically, the high diversity of annotation tools and of data formats, as well as the lack of efficient middleware to interface them all together, make such processes very complex and painful to design. To circumvent this problem, this paper presents the toolkit ROOTS, a freshly released open source toolkit (http://roots-toolkit.gforge.inria.fr) for easy, fast and consistent management of heterogeneously annotated data. ROOTS is designed to efficiently handle massive complex sequential data and to allow quick and light prototyping, as this is often required for research purposes. To illustrate these properties, three sample applications are presented in the field of speech and language processing, though ROOTS can more generally be easily extended to other application domains.

2012

pdf bib

Impact du degré de supervision sur l’adaptation à un domaine d’un modèle de langage à partir du Web (Impact of the level of supervision on Web-based language model domain adaptation) [in French]
Gwénolé Lecorvé | John Dines | Thomas Hain | Petr Motlicek
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 1: JEP

2008

pdf bib abs

On the Use of Web Resources and Natural Language Processing Techniques to Improve Automatic Speech Recognition Systems
Gwénolé Lecorvé | Guillaume Gravier | Pascale Sébillot
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Language models used in current automatic speech recognition systems are trained on general-purpose corpora and are therefore not relevant to transcribe spoken documents dealing with successive precise topics, such as long multimedia streams, frequently tacking reportages and debates. To overcome this problem, this paper shows that Web resources and natural language processing techniques can be effective to automatically adapt the baseline language model of an automatic speech recognition system to any encountered topic. More precisely, we detail how to characterize the topic of transcription segment and how to collect Web pages from which a topic-specific language model can be trained. Then, an adapted language model is obtained by combining the topic-specific language model with the general-purpose language model. Finally, new transcriptions are generated using the adapted language model and are compared with transcriptions previously obtained with the baseline language model. Experiments show that our topic adaptation technique leads to significant transcription quality gains.

Co-authors

Venues

ACL1