Gaël de Chalendar

Also published as: Gael de Chalendar

2025

With the emergence of Large Language Models (LLMs), numerous use cases have arisen in the medical field, particularly in generating summaries for consultation transcriptions and extensive medical reports. A major concern is that these summaries may omit critical information from the original input, potentially jeopardizing the decision-making process. This issue of omission is distinct from hallucination, which involves generating incorrect or fabricated facts. To address omissions, this paper introduces a dataset designed to evaluate such issues and proposes a frugal approach called EmbedKDECheck for detecting omissions in LLM-generated texts. The dataset, created in French, has been validated by medical experts to ensure it accurately represents real-world scenarios in the medical field. The objective is to develop a reference-free (black-box) method that can evaluate the reliability of summaries or reports without requiring significant computational resources, relying only on input and output. Unlike methods that rely on embeddings derived from the LLM itself, our approach uses embeddings generated by a third-party, lightweight NLP model based on a combination of FastText and Word2Vec. These embeddings are then combined with anomaly detection models to identify omissions effectively, making the method well-suited for resource-constrained environments. EmbedKDECheck was benchmarked against black-box state-of-the-art frameworks and models, including SelfCheckGPT, ChainPoll, and G-Eval, which leverage GPT. Results demonstrated its satisfactory performance in detecting omissions in LLM-generated summaries. This work advances frugal methodologies for evaluating the reliability of LLM-generated texts, with significant potential to improve the safety and accuracy of medical decision support systems in surgery and other healthcare domains.

pdf bib abs
A Survey of Recent Advances on Turn-taking Modeling in Spoken Dialogue Systems
Galo Castillo-López | Gael de Chalendar | Nasredine Semmar
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

The rapid growth of dialogue systems adoption to serve humans in daily tasks has increased the realism expected from these systems. One trait of realism is the way speaking agents take their turns. We provide here a review of recent methods on turn-taking modeling and thoroughly describe the corpora used in these studies. We observe that 72% of the reviewed works in this survey do not compare their methods with previous efforts. We argue that one of the challenges in the field is the lack of well-established benchmarks to monitor progress. This work aims to provide the community with a better understanding of the current state of research around turn-taking modeling and future directions to build more realistic spoken conversational agents.

pdf bib abs
Détection des omissions dans les résumés médicaux générés par les grands modèles de langue
Achir Oukelmoun | Nasredine Semmar | Gaël de Chalendar | Clément Cormi | Mariame Oukelmoun | Eric Vibert | Marc-Antoine Allard
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Les grands modèles de langue (LLMs) sont de plus en plus utilisés pour résumer des textes médicaux, mais ils risquent d’omettre des informations critiques, compromettant ainsi la prise de décision. Contrairement aux hallucinations, les omissions concernent des faits essentiels absents. Cet article introduit un jeu de données validé en français pour détecter ces omissions et propose EmbedKDECheck, une approche frugale et sans référence. A l’opposé des méthodes basées sur les LLMs, cette approche utilise des plongements lexicaux issus d’un modèle de Traitement Automatique des Langues (TAL) léger combinant FastText et Word2Vec selon un algorithme précis couplé à un modèle non-supervisé fournissant un score d’anomalie. Cette approche permet d’identifier efficacement les omissions à faible coût computationnel. EmbedKDECheck a été évalué face aux frameworks de l’état de l’art (SelfCheckGPT, ChainPoll, G-Eval et GPTScore) et a montré de bonnes performances. Notre méthode renforce l’évaluation de la fiabilité des LLMs et contribue à une prise de décision médicale plus sûre.

pdf bib abs
Intent Recognition and Out-of-Scope Detection using LLMs in Multi-party Conversations
Galo Castillo-López | Gael de Chalendar | Nasredine Semmar
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Intent recognition is a fundamental component in task-oriented dialogue systems (TODS). Determining user intents and detecting whether an intent is Out-of-Scope (OOS) is crucial for TODS to provide reliable responses. However, traditional TODS require large amount of annotated data. In this work we propose a hybrid approach to combine BERT and LLMs in zero and few-shot scenarios to recognize intents and detect OOS utterances. Our approach leverages LLMs generalization power and BERT’s computational efficiency in such scenarios. We evaluate our method on multi-party conversation corpora and observe that sharing information from BERT outputs to LLMs lead to system performance improvement.

2023

Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language.X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source.

2021

pdf bib abs
GeSERA: General-domain Summary Evaluation by Relevance Analysis
Jessica López Espejel | Gaël de Chalendar | Jorge Garcia Flores | Thierry Charnois | Ivan Vladimir Meza Ruiz
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

We present GeSERA, an open-source improved version of SERA for evaluating automatic extractive and abstractive summaries from the general domain. SERA is based on a search engine that compares candidate and reference summaries (called queries) against an information retrieval document base (called index). SERA was originally designed for the biomedical domain only, where it showed a better correlation with manual methods than the widely used lexical-based ROUGE method. In this paper, we take out SERA from the biomedical domain to the general one by adapting its content-based method to successfully evaluate summaries from the general domain. First, we improve the query reformulation strategy with POS Tags analysis of general-domain corpora. Second, we replace the biomedical index used in SERA with two article collections from AQUAINT-2 and Wikipedia. We conduct experiments with TAC2008, TAC2009, and CNNDM datasets. Results show that, in most cases, GeSERA achieves higher correlations with manual evaluation methods than SERA, while it reduces its gap with ROUGE for general-domain summary evaluation. GeSERA even surpasses ROUGE in two cases of TAC2009. Finally, we conduct extensive experiments and provide a comprehensive study of the impact of human annotators and the index size on summary evaluation with SERA and GeSERA.

2020

pdf bib abs
Iagotchi : vers un agent conversationnel artistique (Iagotchi : Towards an Artistic Conversational Agent )
Frejus Laleye | Gaël de Chalendar | Léopold Frey | Rocio Berenguer
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Cet article décrit Iagotchi, un personnage virtuel philosophique et artistique qui apprend et développe des connaissances à partir de ses interactions avec l’humain. Iagotchi se présente à la fois comme un apprenant et un expert avec comme objectifs principaux (1) d’accompagner l’homme dans ses questionnements, (2) de lui fournir des réponses pertinentes sur la base de ses requêtes et (3) de générer des textes poétiques cohérents. Dans ce travail, nous décrivons l’architecture du système de Iagotchi et les composants clés tels que le moteur de conversation, le gestionnaire de sujets et le générateur de poésies.

pdf bib abs
A French Medical Conversations Corpus Annotated for a Virtual Patient Dialogue System
Fréjus A. A. Laleye | Gaël de Chalendar | Antonia Blanié | Antoine Brouquet | Dan Behnamou
Proceedings of the Twelfth Language Resources and Evaluation Conference

Data-driven approaches for creating virtual patient dialogue systems require the availability of large data specific to the language,domain and clinical cases studied. Based on the lack of dialogue corpora in French for medical education, we propose an annotatedcorpus of dialogues including medical consultation interactions between doctor and patient. In this work, we detail the building processof the proposed dialogue corpus, describe the annotation guidelines and also present the statistics of its contents. We then conducted aquestion categorization task to evaluate the benefits of the proposed corpus that is made publicly available.

2019

pdf bib abs
Hybridation d’un agent conversationnel avec des plongements lexicaux pour la formation au diagnostic médical (Hybridization of a conversational agent with word embeddings for medical diagnostic training)
Fréjus A. A. Laleye | Gaël de Chalendar | Antoine Brouquet | Antonia Blanié | Dan Benhamou
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Dans le contexte médical, un patient ou médecin virtuel dialoguant permet de former les apprenants au diagnostic médical via la simulation de manière autonome. Dans ce travail, nous avons exploité les propriétés sémantiques capturées par les représentations distribuées de mots pour la recherche de questions similaires dans le système de dialogues d’un agent conversationnel médical. Deux systèmes de dialogues ont été créés et évalués sur des jeux de données collectées lors des tests avec les apprenants. Le premier système fondé sur la correspondance de règles de dialogue créées à la main présente une performance globale de 92% comme taux de réponses cohérentes sur le cas clinique étudié tandis que le second système qui combine les règles de dialogue et la similarité sémantique réalise une performance de 97% de réponses cohérentes en réduisant de 7% les erreurs de compréhension par rapport au système de correspondance de règles.

2018

pdf bib abs
Nouveautés de l’analyseur linguistique LIMA (What’s New in the LIMA Language Analyzer)
Gaël de Chalendar
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

LIMA est un analyseur linguistique libre d’envergure industrielle. Nous présentons ici ses évolutions depuis la dernière publication en 2014.

2017

pdf bib abs
Taking into account Inter-sentence Similarity for Update Summarization
Maâli Mnasri | Gaël de Chalendar | Olivier Ferret
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Following Gillick and Favre (2009), a lot of work about extractive summarization has modeled this task by associating two contrary constraints: one aims at maximizing the coverage of the summary with respect to its information content while the other represents its size limit. In this context, the notion of redundancy is only implicitly taken into account. In this article, we extend the framework defined by Gillick and Favre (2009) by examining how and to what extent integrating semantic sentence similarity into an update summarization system can improve its results. We show more precisely the impact of this strategy through evaluations performed on DUC 2007 and TAC 2008 and 2009 datasets.

2016

pdf bib abs
Intégration de la similarité entre phrases comme critère pour le résumé multi-document (Integrating sentence similarity as a constraint for multi-document summarization)
Maâli Mnasri | Gaël de Chalendar | Olivier Ferret
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

multi-document Maâli Mnasri1, 2 Gaël de Chalendar1 Olivier Ferret1 (1) CEA, LIST, Laboratoire Vision et Ingénierie des Contenus, Gif-sur-Yvette, F-91191, France. (2) Université Paris-Sud, Université Paris-Saclay, F-91405 Orsay, France. maali.mnasri@cea.fr, gael.de-chalendar@cea.fr, olivier.ferret@cea.fr R ÉSUMÉ À la suite des travaux de Gillick & Favre (2009), beaucoup de travaux portant sur le résumé par extraction se sont appuyés sur une modélisation de cette tâche sous la forme de deux contraintes antagonistes : l’une vise à maximiser la couverture du résumé produit par rapport au contenu des textes d’origine tandis que l’autre représente la limite du résumé en termes de taille. Dans cette approche, la notion de redondance n’est prise en compte que de façon implicite. Dans cet article, nous reprenons le cadre défini par Gillick & Favre (2009) mais nous examinons comment et dans quelle mesure la prise en compte explicite de la similarité sémantique des phrases peut améliorer les performances d’un système de résumé multi-document. Nous vérifions cet impact par des évaluations menées sur les corpus DUC 2003 et 2004.

2014

pdf bib abs
Adapting VerbNet to French using existing resources
Quentin Pradet | Laurence Danlos | Gaël de Chalendar
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

VerbNet is an English lexical resource for verbs that has proven useful for English NLP due to its high coverage and coherent classification. Such a resource doesnt exist for other languages, despite some (mostly automatic and unsupervised) attempts. We show how to semi-automatically adapt VerbNet using existing resources designed for diï¬erent purposes. This study focuses on French and uses two French resources: a semantic lexicon (Les Verbes Français) and a syntactic lexicon (Lexique-Grammaire).

pdf bib abs
The LIMA Multilingual Analyzer Made Free: FLOSS Resources Adaptation and Correction
Gaël de Chalendar
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

At CEA LIST, we have decided to release our multilingual analyzer LIMA as Free software. As we were not proprietary of all the language resources used we had to select and adapt free ones in order to attain results good enough and equivalent to those obtained with our previous ones. For English and French, we found and adapted a full-form dictionary and an annotated corpus for learning part-of-speech tagging models.

The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain.

pdf bib
WoNeF, an improved, expanded and evaluated automatic French translation of WordNet
Quentin Pradet | Gaël de Chalendar | Jeanne Baguenier Desormeaux
Proceedings of the Seventh Global Wordnet Conference

2013

pdf bib
WoNeF, an improved, extended and evaluated automatic French translation of WordNet (WoNeF : amélioration, extension et évaluation d’une traduction française automatique de WordNet) [in French]
Quentin Pradet | Jeanne Baguenier-Desormeaux | Gaël de Chalendar | Laurence Danlos
Proceedings of TALN 2013 (Volume 1: Long Papers)

2010

pdf bib abs
JAWS : Just Another WordNet Subset
Claire Mouton | Gaël de Chalendar
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

WordNet, une des ressources lexicales les plus utilisées aujourd’hui a été constituée en anglais et les chercheurs travaillant sur d’autres langues souffrent du manque d’une telle ressource. Malgré les efforts fournis par la communauté française, les différents WordNets produits pour la langue française ne sont toujours pas aussi exhaustifs que le WordNet de Princeton. C’est pourquoi nous proposons une méthode novatrice dans la production de termes nominaux instanciant les différents synsets de WordNet en exploitant les propriétés syntaxiques distributionnelles du vocabulaire français. Nous comparons la ressource que nous obtenons avecWOLF et montrons que notre approche offre une couverture plus large.

pdf bib
A hybrid word alignment approach to improve translation lexicons with compound words and idiomatic expressions
Nasredine Semmar | Christophe Servan | Gaël de Chalendar | Benoît Le Ny | Jean-Jacques Bouzaglou
Proceedings of Translating and the Computer 32

pdf bib abs
FrameNet Translation Using Bilingual Dictionaries with Evaluation on the English-French Pair
Claire Mouton | Gaël de Chalendar | Benoit Richert
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Semantic Role Labeling cannot be performed without an associated linguistic resource. A key resource for such a task is the FrameNet resource based on Fillmores theory of frame semantics. Like many linguistic resources, FrameNet has been built by English native speakers for the English language. To overcome the lack of such resources in other languages, we propose a new approach to FrameNet translation by using bilingual dictionaries and filtering the wrong translations. We define six scores to filter, based on translation redundancy and FrameNet structure. We also present our work on the enrichment of the obtained resource with nouns. This enrichment uses semantic spaces built on syntactical dependencies and a multi-represented k-NN classifier. We evaluate both the tasks on the French language over a subset of ten frames and show improved results compared to the existing French FrameNet. Our final resource contains 15,132 associations lexical units-frames for an estimated precision of 86%.

pdf bib abs
LIMA : A Multilingual Framework for Linguistic Analysis and Linguistic Resources Development and Evaluation
Romaric Besançon | Gaël de Chalendar | Olivier Ferret | Faiza Gara | Olivier Mesnard | Meriama Laïb | Nasredine Semmar
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The increasing amount of available textual information makes necessary the use of Natural Language Processing (NLP) tools. These tools have to be used on large collections of documents in different languages. But NLP is a complex task that relies on many processes and resources. As a consequence, NLP tools must be both configurable and efficient: specific software architectures must be designed for this purpose. We present in this paper the LIMA multilingual analysis platform, developed at CEA LIST. This configurable platform has been designed to develop NLP based industrial applications while keeping enough flexibility to integrate various processes and resources. This design makes LIMA a linguistic analyzer that can handle languages as different as French, English, German, Arabic or Chinese. Beyond its architecture principles and its capabilities as a linguistic analyzer, LIMA also offers a set of tools dedicated to the test and the evaluation of linguistic modules and to the production and the management of new linguistic resources.

2009

pdf bib
Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing
Claire Mouton | Guillaume Pitel | Gaël de Chalendar | Anne Vilnat
Proceedings of the International Conference RANLP-2009

pdf bib
Modular resource development and diagnostic evaluation framework for fast NLP system improvement
Gaël de Chalendar | Damien Nouvel
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

2003

La fiabilité des réponses qu’il propose, ou un moyen de l’estimer, est le meilleur atout d’un système de question-réponse. A cette fin, nous avons choisi d’effectuer des recherches dans des ensembles de documents différents et de privilégier des résultats qui sont trouvés dans ces différentes sources. Ainsi, le système QALC travaille à la fois sur une collection finie d’articles de journaux et sur le Web.