Karën Fort

Also published as: Karen Fort

2025

MascuLead : le premier tableau de bord de biais de genre
Fanny Ducel | Jeffrey André | Aurélie Névéol | Karën Fort
Actes de l'atelier Ethic and Alignment of (Large) Language Models 2025 (EALM)

Nous présentons MascuLead, le premier tableau de bord centré sur une tâche de détection de biais de genre pour les langues fleuries. Nous appliquons un cadre existant sur plusieurs Grands Modèles de Langue et comparons leurs performances. Nous soulignons également l’importance d’inclure des références de biais dans les tableaux de bord, et nous questionnons la notion même de tableaux de bord.

pdf bib abs

Latrumplang, instrument de destruction de la pensée : analyse de l’impact de la censure trumpiste sur la recherche en santé mentale
Vincent P. Martin | Karën Fort | Jean-Arthur Micoulaud-Franchi
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Un processus de censure de l’activité scientifique est en cours aux États-Unis. À partir de listes de termes interdits, des dossiers de financements sont réétudiés, des articles scientifiques sont rétractés. Or, le langage structure les tranches du réel descriptibles — et donc celles qui peuvent être étudiées scientifiquement. Dans cet article, nous souhaitons afficher comment la mise en place d’une telle censure pourrait provoquer la disparition de la recherche portant sur la santé mentale. Pour cela, nous avons réalisé une analyse bibliographique des 64 434 articles contenant le terme « mental health » dans leur titre référencé dans PubMed. Nous avons ensuite extrait une liste de termes interdits de leur résumé, identifié les thèmes sous-jacents et généré un réseau lexical. Ces résultats démontrent l’impossibilité de penser la santé mentale sans les termes interdits par les directives trumpistes, dont la censure signerait l’abandon de plus de 50 ans de progrès en santé publique.

pdf bib abs

“Women do not have heart attacks!” Gender Biases in Automatically Generated Clinical Cases in French
Fanny Ducel | Nicolas Hiebel | Olivier Ferret | Karën Fort | Aurélie Névéol
Findings of the Association for Computational Linguistics: NAACL 2025

Healthcare professionals are increasingly including Language Models (LMs) in clinical practice. However, LMs have been shown to exhibit and amplify stereotypical biases that can cause life-threatening harm in a medical context. This study aims to evaluate gender biases in automatically generated clinical cases in French, on ten disorders. Using seven LMs fine-tuned for clinical case generation and an automatic linguistic gender detection tool, we measure the associations between disorders and gender. We unveil that LMs over-generate cases describing male patients, creating synthetic corpora that are not consistent with documented prevalence for these disorders. For instance, when prompts do not specify a gender, LMs generate eight times more clinical cases describing male (vs. female patients) for heart attack. We discuss the ideal synthetic clinical case corpus and establish that explicitly mentioning demographic information in generation instructions appears to be the fairest strategy. In conclusion, we argue that the presence of gender biases in synthetic text raises concerns about LM-induced harm, especially for women and transgender people.

pdf bib abs

Cet article présente un travail collectif mené par des chercheurs et des porteurs de projets dans le domaine des sciences participatives (SP). Notre démarche s’inscrit dans la « culture de l’erreur » dans les SP, introduite par Westreicher et al. (2021) : nous partageons notre vision des erreurs et des difficultés que nous avons expérimentées, ce que nous en avons appris et ce que nous ferions différemment. Ce travail s’appuie sur dix projets français couvrant une variété d’objectifs, de disciplines, et de publics. Cette diversité nous a permis de mener une réflexion transversale et libre de toute spécificité sur nos pratiques. Nous avons identifié 3 types d’erreurs ou de difficultés, que nous illustrons par des exemples tirés de notre propre expérience. Le premier type d’erreurs concerne celles que nous ne répéterions pas. Contrairement à ces erreurs « réelles », les deux types de difficultés suivants sont une invitation à défendre une vision plus réaliste des SP. Le deuxième type fait en effet référence aux incertitudes inhérentes à la plasticité des projets de SP. Le dernier type concerne enfin les obstacles et difficultés qui peuvent conduire à des conséquences positives inattendues, à la fois en termes d’objectifs scientifiques et vis-à-vis de la communauté des participant(e)s. Dans cet article, nous encourageons donc un changement dans la façon dont nous considérons les erreurs et les difficultés : l’incertitude est inhérente aux SP et nous affirmons notre droit d’expérimenter, de faire des erreurs et de changer de pratiques au cours de la vie d’un projet.

pdf bib abs

« De nos jours, ce sont les résultats qui comptent » : création et étude diachronique d’un corpus de revendications issues d’articles de TAL
Clementine Bleuze | Fanny Ducel | Maxime Amblard | Karën Fort
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Nous constituons un corpus de phrases issues de pré-tirages et d’articles de TAL, publiés en anglais entre 1952 et 2024, dont nous annotons manuellement un échantillon avec des catégories de revendications reflétant leur fonction rhétorique au sein des articles. Nous affinons un modèle SciBERT (Beltagy et al. , 2019) pour prédire les étiquettes restantes, que nous mettons, avec le corpus annoté, à la disposition de la communauté. Nous illustrons l’intérêt du corpus par des analyses exploratoires sur les caractéristiques des revendications relevées, ainsi qu’une étude diachronique de l’évolution de la structure des résumés; ceci est mis en lien avec une réflexion sur la notion d’exagération scientifique. Nous observons une importance croissante des séquences de contexte précédant l’exposé des contributions, lequel est également de plus en plus suivi de séquences de résultats.

pdf bib abs

With NLP research being rapidly productionized into real-world applications, it is important to be aware of and think through the consequences of our work. Such ethical considerations are important in both authoring and reviewing (e.g. privacy, consent, fairness, among others). This tutorial will equip participants with basic guidelines for thinking deeply about ethical issues and review common considerations that recur in NLP research. The methodology is interactive and participatory, including discussion of case studies and group work. Participants will gain practical experience on when to flag a paper for ethics review and how to write an ethical consideration section to be shared with the broader community. Most importantly, the participants will be co-creating the tutorial outcomes and extending tutorial materials to share as public outcomes.

pdf bib abs

« Les femmes ne font pas de crise cardiaque ! » Étude des biais de genre dans les cas cliniques synthétiques en français
Fanny Ducel | Nicolas Hiebel | Olivier Ferret | Karën Fort | Aurélie Névéol
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : traductions d'articles publiés

De plus en plus de professionnels de santé utilisent des modèles de langue. Cependant, ces modèles présentent et amplifient des biais stéréotypés qui peuvent mettre en danger des vies. Cette étude vise à évaluer les biais de genre dans des cas cliniques générés automatiquement en français concernant dix pathologies. Nous utilisons sept modèles de langue affinés et un outil de détection automatique du genre pour mesurer les associations entre pathologie et genre. Nous montrons que les modèles sur-génèrent des cas décrivant des patients masculins, allant à l’encontre des prévalences réelles. Par exemple, lorsque les invites ne spécifient pas de genre, les modèles génèrent huit fois plus de cas cliniques décrivant des patients (plutôt que des patientes) pour les crises cardiaques. Nous discutons des possibles dommages induits par les modèles de langue, en particulier pour les femmes et les personnes transgenres, de la définition d’un modèle de langue « idéal » et des moyens d’y parvenir.

2024

pdf bib

La recherche sur les biais dans les modèles de langue est biaisée : état de l’art en abyme [Bias Research for Language Models is Biased: a Survey for Deconstructing Bias in Large Language Models]
Fanny Ducel | Aurélie Névéol | Karën Fort
Traitement Automatique des Langues, Volume 64, Numéro 3 : Explicabilité des modèles de TAL [Explainability of NLP models]

Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting The study of bias, fairness and social impact in Natural Language Processing (NLP) lacks resources in languages other than English. Our objective is to support the evaluation of bias in language models in a multilingual setting. We use stereotypes across nine types of biases to build a corpus containing contrasting sentence pairs, one sentence that presents a stereotype concerning an underadvantaged group and another minimally changed sentence, concerning a matching advantaged group. We build on the French CrowS-Pairs corpus and guidelines to provide translations of the existing material into seven additional languages. In total, we produce 11,139 new sentence pairs that cover stereotypes dealing with nine types of biases in seven cultural contexts. We use the final resource for the evaluation of relevant monolingual and multilingual masked language models. We find that language models in all languages favor sentences that express stereotypes in most bias categories. The process of creating a resource that covers a wide range of language types and cultural settings highlights the difficulty of bias evaluation, in particular comparability across languages and contexts.

pdf bib abs

Évaluation automatique des biais de genre dans des modèles de langue auto-régressifs
Fanny Ducel | Aurélie Névéol | Karën Fort
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Nous proposons un outil pour mesurer automatiquement les biais de genre dans des textes générés par des grands modèles de langue dans des langues flexionnelles. Nous évaluons sept modèles à l’aide de 52 000 textes en français et 2 500 textes en italien, pour la rédaction de lettres de motivation. Notre outil s’appuie sur la détection de marqueurs morpho-syntaxiques de genre pour mettre au jour des biais. Ainsi, les modèles favorisent largement la génération de masculin : le genre masculin est deux fois plus présent que le féminin en français, et huit fois plus en italien. Les modèles étudiés exacerbent également des stéréotypes attestés en sociologie en associant les professions stéréotypiquement féminines aux textes au féminin, et les professions stéréotypiquement masculines aux textes au masculin.

pdf bib abs

Hostomytho: A GWAP for Synthetic Clinical Texts Evaluation and Annotation
Nicolas Hiebel | Bertrand Remy | Bruno Guillaume | Olivier Ferret | Aurélie Névéol | Karen Fort
Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024

This paper presents the creation of Hostomytho, a game with a purpose intended for evaluating the quality of synthetic biomedical texts through multiple mini-games. Hostomytho was developed entirely using open source technologies both for internet browser and mobile platforms (IOS & Android). The code and the annotations created for synthetic clinical cases in French will be made freely available.

pdf bib abs

Beyond Model Performance: Can Link Prediction Enrich French Lexical Graphs?
Hee-Soo Choi | Priyansh Trivedi | Mathieu Constant | Karen Fort | Bruno Guillaume
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a resource-centric study of link prediction approaches over French lexical-semantic graphs. Our study incorporates two graphs, RezoJDM16k and RL-fr, and we evaluated seven link prediction models, with CompGCN-ConvE emerging as the best performer. We also conducted a qualitative analysis of the predictions using manual annotations. Based on this, we found that predictions with higher confidence scores were more valid for inclusion. Our findings highlight different benefits for the dense graph compared to the sparser graph RL-fr. While the addition of new triples to RezoJDM16k offers limited advantages, RL-fr can benefit substantially from our approach.

pdf bib abs

Au-delà de la performance des modèles : la prédiction de liens peut-elle enrichir des graphes lexico-sémantiques du français ?
Hee-Soo Choi | Priyansh Trivedi | Mathieu Constant | Karën Fort | Bruno Guillaume
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Cet article présente une étude sur l’utilisation de modèles de prédiction de liens pour l’enrichissement de graphes lexico-sémantiques du français. Celle-ci porte sur deux graphes, RezoJDM16k et RL-fr et sept modèles de prédiction de liens. Nous avons étudié les prédictions du modèle le plus performant, afin d’extraire de potentiels nouveaux triplets en utilisant un score de confiance que nous avons évalué avec des annotations manuelles. Nos résultats mettent en évidence des avantages différentspour le graphe dense RezoJDM16k par rapport à RL-fr, plus clairsemé. Si l’ajout de nouveaux triplets à RezoJDM16k offre des avantages limités, RL-fr peut bénéficier substantiellement de notre approche.

pdf bib

Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024
Chris Madge | Jon Chamberlain | Karen Fort | Udo Kruschwitz | Stephanie Lukin
Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024

pdf bib abs

Unveiling Strengths and Weaknesses of NLP Systems Based on a Rich Evaluation Corpus: The Case of NER in French
Alice Millour | Yoann Dupont | Karen Fort | Liam Duignan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Named Entity Recognition (NER) is an applicative task for which annotation schemes vary. To compare the performance of systems which tagsets differ in precision and coverage, it is necessary to assess (i) the comparability of their annotation schemes and (ii) the individual adequacy of the latter to a common annotation scheme. What is more, and given the lack of robustness of some tools towards textual variation, we cannot expect an evaluation led on an homogeneous corpus with low-coverage to provide a reliable prediction of the actual tools performance. To tackle both these limitations in evaluation, we provide a gold corpus for French covering 6 textual genres and annotated with a rich tagset that enables comparison with multiple annotation schemes. We use the flexibility of this gold corpus to provide both: (i) an individual evaluation of four heterogeneous NER systems on their target tagsets, (ii) a comparison of their performance on a common scheme. This rich evaluation framework enables a fair comparison of NER systems across textual genres and annotation schemes.

pdf bib abs

Génération contrôlée de cas cliniques en français à partir de données médicales structurées
Hugo Boulanger | Nicolas Hiebel | Olivier Ferret | Karën Fort | Aurélie Névéol
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

La génération de texte ouvre des perspectives pour pallier l’absence de corpus librement partageables dans des domaines contraints par la confidentialité, comme le domaine médical. Dans cette étude, nous comparons les performances de modèles encodeurs-décodeurs et décodeurs seuls pour la génération conditionnée de cas cliniques en français. Nous affinons plusieurs modèles pré-entraînés pour chaque architecture sur des cas cliniques en français conditionnés par les informations démographiques des patient·es (sexe et âge) et des éléments cliniques.Nous observons que les modèles encodeur-décodeurs sont plus facilement contrôlables que les modèles décodeurs seuls, mais plus coûteux à entraîner.

pdf bib abs

Using Structured Health Information for Controlled Generation of Clinical Cases in French
Hugo Boulanger | Nicolas Hiebel | Olivier Ferret | Karën Fort | Aurélie Névéol
Proceedings of the 6th Clinical Natural Language Processing Workshop

Text generation opens up new prospects for overcoming the lack of open corpora in fields such as healthcare, where data sharing is bound by confidentiality. In this study, we compare the performance of encoder-decoder and decoder-only language models for the controlled generation of clinical cases in French. To do so, we fine-tuned several pre-trained models on French clinical cases for each architecture and generate clinical cases conditioned by patient demographic information (gender and age) and clinical features.Our results suggest that encoder-decoder models are easier to control than decoder-only models, but more costly to train.

2023

pdf bib abs

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research
Mohamed Abdalla | Jan Philip Wahle | Terry Ruas | Aurélie Névéol | Fanny Ducel | Saif Mohammad | Karen Fort
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.

pdf bib abs

Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French
Nicolas Hiebel | Olivier Ferret | Karen Fort | Aurélie Névéol
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

In sensitive domains, the sharing of corpora is restricted due to confidentiality, copyrights or trade secrets. Automatic text generation can help alleviate these issues by producing synthetic texts that mimic the linguistic properties of real documents while preserving confidentiality. In this study, we assess the usability of synthetic corpus as a substitute training corpus for clinical information extraction. Our goal is to automatically produce a clinical case corpus annotated with clinical entities and to evaluate it for a named entity recognition (NER) task. We use two auto-regressive neural models partially or fully trained on generic French texts and fine-tuned on clinical cases to produce a corpus of synthetic clinical cases. We study variants of the generation process: (i) fine-tuning on annotated vs. plain text (in that case, annotations are obtained a posteriori) and (ii) selection of generated texts based on models parameters and filtering criteria. We then train NER models with the resulting synthetic text and evaluate them on a gold standard clinical corpus. Our experiments suggest that synthetic text is useful for clinical NER.

pdf bib abs

Des ressources lexicales du français et de leur utilisation en TAL : étude des actes de TALN
Hee-Soo Choi | Karën Fort | Bruno Guillaume | Mathieu Constant
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : travaux de recherche originaux -- articles courts

Au début du XXIe siècle, le français faisait encore partie des langues peu dotées. Grâce aux efforts de la communauté française du traitement automatique des langues (TAL), de nombreuses ressources librement disponibles ont été produites, dont des lexiques du français. À travers cet article, nous nous intéressons à leur devenir dans la communauté par le prisme des actes de la conférence TALN sur une période de 20 ans.

pdf bib abs

Understanding Ethics in NLP Authoring and Reviewing
Luciana Benotti | Karën Fort | Min-Yen Kan | Yulia Tsvetkov
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

With NLP research now quickly being transferred into real-world applications, it is important to be aware of and think through the consequences of our scientific investigation. Such ethical considerations are important in both authoring and reviewing. This tutorial will equip participants with basic guidelines for thinking deeply about ethical issues and review common considerations that recur in NLP research. The methodology is interactive and participatory, including case studies and working in groups. Importantly, the participants will be co-building the tutorial outcomes and will be working to create further tutorial materials to share as public outcomes.

pdf bib abs

Les textes cliniques français générés sont-ils dangereusement similaires à leur source ? Analyse par plongements de phrases
Nicolas Hiebel | Olivier Ferret | Karën Fort | Aurélie Névéol
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : travaux de recherche originaux -- articles courts

Les ressources textuelles disponibles dans le domaine biomédical sont rares pour des raisons de confidentialité. Des données existent mais ne sont pas partageables, c’est pourquoi il est intéressant de s’inspirer de ces données pour en générer de nouvelles sans contrainte de partage. Une difficulté majeure de la génération de données médicales est que les données générées doivent ressembler aux données originales sans compromettre leur confidentialité. L’évaluation de cette tâche est donc difficile. Dans cette étude, nous étendons l’évaluation de corpus cliniques générés en français en y ajoutant une dimension sémantique à l’aide de plongements de phrases. Nous recherchons des phrases proches à l’aide de similarité cosinus entre plongements, et analysons les scores de similarité. Nous observons que les phrases synthétiques sont thématiquement proches du corpus original, mais suffisamment éloignées pour ne pas être de simples reformulations qui compromettraient la confidentialité.

2022

pdf bib abs

Use of a Citizen Science Platform for the Creation of a Language Resource to Study Bias in Language Models for French: A Case Study
Karën Fort | Aurélie Névéol | Yoann Dupont | Julien Bezançon
Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022

There is a growing interest in the evaluation of bias, fairness and social impact of Natural Language Processing models and tools. However, little resources are available for this task in languages other than English. Translation of resources originally developed for English is a promising research direction. However, there is also a need for complementing translated resources by newly sourced resources in the original languages and social contexts studied. In order to collect a language resource for the study of biases in Language Models for French, we decided to resort to citizen science. We created three tasks on the LanguageARC citizen science platform to assist with the translation of an existing resource from English into French as well as the collection of complementary resources in native French. We successfully collected data for all three tasks from a total of 102 volunteer participants. Participants from different parts of the world contributed and we noted that although calls sent to mailing lists had a positive impact on participation, some participants pointed barriers to contributions due to the collection platform.

pdf bib abs

This paper describes the continuation of a project that aims at establishing an interoperable annotation schema for quantification phenomena as part of the ISO suite of standards for semantic annotation, known as the Semantic Annotation Framework. After a break, caused by the Covid-19 pandemic, the project was relaunched in early 2022 with a second working draft of an annotation scheme, which is discussed in this paper. Keywords: semantic annotation, quantification, interoperability, annotation schema, ISO standard

pdf bib abs

Langues par défaut? Analyse contrastive et diachronique des langues non citées dans les articles de TALN et d’ACL (Contrastive and diachronic study of unmentioned (by default ?) languages in TALN and ACL We study the application of the #BenderRule in natural language processing articles, taking into account a contrastive and a diachronic dimensions, by examining the proceedings of two NLP conferences, TALN and ACL, over time)
Fanny Ducel | Karën Fort | Gaël Lejeune | Yves Lepage
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Cet article étudie l’application de la #RègledeBender dans des articles de traitement automatique des langues (TAL), en prenant en compte une dimension contrastive, par l’examen des actes de deux conférences du domaine, TALN et ACL, et une dimension diachronique, en examinant ces conférences au fil du temps. Un échantillon d’articles a été annoté manuellement et deux classifieurs ont été développés afin d’annoter automatiquement les autres articles. Nous quantifions ainsi l’application de la #RègledeBender, et mettons en évidence un léger mieux en faveur de TALN sur cet aspect.

pdf bib abs

Do we Name the Languages we Study? The #BenderRule in LREC and ACL articles
Fanny Ducel | Karën Fort | Gaël Lejeune | Yves Lepage
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This article studies the application of the #BenderRule in Natural Language Processing (NLP) articles according to two dimensions. Firstly, in a contrastive manner, by considering two major international conferences, LREC and ACL, and secondly, in a diachronic manner, by inspecting nearly 14,000 articles over a period of time ranging from 2000 to 2020 for LREC and from 1979 to 2020 for ACL. For this purpose, we created a corpus from LREC and ACL articles from the above-mentioned periods, from which we manually annotated nearly 1,000. We then developed two classifiers to automatically annotate the rest of the corpus. Our results show that LREC articles tend to respect the #BenderRule (80 to 90% of them respect it), whereas 30 to 40% of ACL articles do not. Interestingly, over the considered periods, the results appear to be stable for the two conferences, even though a rebound in ACL 2020 could be a sign of the influence of the blog post about the #BenderRule.

pdf bib abs

CLISTER : Un corpus pour la similarité sémantique textuelle dans des cas cliniques en français (CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives)
Nicolas Hiebel | Karën Fort | Aurélie Névéol | Olivier Ferret
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Le TAL repose sur la disponibilité de corpus annotés pour l’entraînement et l’évaluation de modèles. Il existe très peu de ressources pour la similarité sémantique dans le domaine clinique en français. Dans cette étude, nous proposons une définition de la similarité guidée par l’analyse clinique et l’appliquons au développement d’un nouveau corpus partagé de 1 000 paires de phrases annotées manuellement en scores de similarité. Nous évaluons ensuite le corpus par des expériences de mesure automatique de similarité. Nous montrons ainsi qu’un modèle de plongements de phrases peut capturer la similarité avec des performances à l’état de l’art sur le corpus DEFT STS (Spearman=0,8343). Nous montrons également que le contenu du corpus CLISTER est complémentaire de celui de DEFT STS.

pdf bib abs

CLISTER : A Corpus for Semantic Textual Similarity in French Clinical Narratives
Nicolas Hiebel | Olivier Ferret | Karën Fort | Aurélie Névéol
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Modern Natural Language Processing relies on the availability of annotated corpora for training and evaluating models. Such resources are scarce, especially for specialized domains in languages other than English. In particular, there are very few resources for semantic similarity in the clinical domain in French. This can be useful for many biomedical natural language processing applications, including text generation. We introduce a definition of similarity that is guided by clinical facts and apply it to the development of a new French corpus of 1,000 sentence pairs manually annotated according to similarity scores. This new sentence similarity corpus is made freely available to the community. We further evaluate the corpus through experiments of automatic similarity measurement. We show that a model of sentence embeddings can capture similarity with state-of-the-art performance on the DEFT STS shared task evaluation data set (Spearman=0.8343). We also show that the corpus is complementary to DEFT STS.

pdf bib abs

FENEC : un corpus équilibré pour l’évaluation des entités nommées en français (FENEC : a balanced sample corpus for French named entity recognition )
Alice Millour | Yoann Dupont | Alexane Jouglar | Karën Fort
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Nous présentons ici FENEC (FrEnch Named-entity Evaluation Corpus), un corpus à échantillons équilibrés contenant six genres, annoté en entités nommées selon le schéma fin Quæro. Les caractéristiques de ce corpus nous permettent d’évaluer et de comparer trois outils d’annotation automatique — un à base de règles et deux à base de réseaux de neurones — en jouant sur trois dimensions : la finesse du jeu d’étiquettes, le genre des corpus, et les métriques d’évaluation.

pdf bib abs

French CrowS-Pairs: Extension à une langue autre que l’anglais d’un corpus de mesure des biais sociétaux dans les modèles de langue masqués (French CrowS-Pairs : Extending a challenge dataset for measuring social bias in masked language models to a language other than English)
Aurélie Névéol | Yoann Dupont | Julien Bezançon | Karën Fort
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Afin de permettre l’étude des biais en traitement automatique de la langue au delà de l’anglais américain, nous enrichissons le corpus américain CrowS-pairs de 1 677 paires de phrases en français représentant des stéréotypes portant sur dix catégories telles que le genre. 1 467 paires de phrases sont traduites à partir de CrowS-pairs et 210 sont nouvellement recueillies puis traduites en anglais. Selon le principe des paires minimales, les phrases du corpus contrastent un énoncé stéréotypé concernant un groupe défavorisé et son équivalent pour un groupe favorisé. Nous montrons que quatre modèles de langue favorisent les énoncés qui expriment des stéréotypes dans la plupart des catégories. Nous décrivons le processus de traduction et formulons des recommandations pour étendre le corpus à d’autres langues. Attention : Cet article contient des énoncés de stéréotypes qui peuvent être choquants.

pdf bib abs

French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English
Aurélie Névéol | Yoann Dupont | Julien Bezançon | Karën Fort
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting. Much work on biases in natural language processing has addressed biases linked to the social and cultural experience of English speaking individuals in the United States. We seek to widen the scope of bias studies by creating material to measure social bias in language models (LMs) against specific demographic groups in France. We build on the US-centered CrowS-pairs dataset to create a multilingual stereotypes dataset that allows for comparability across languages while also characterizing biases that are specific to each country and language. We introduce 1,679 sentence pairs in French that cover stereotypes in ten types of bias like gender and age. 1,467 sentence pairs are translated from CrowS-pairs and 212 are newly crowdsourced. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models (three French, one multilingual) favor sentences that express stereotypes in most bias categories. We report on the translation process from English into French, which led to a characterization of stereotypes in CrowS-pairs including the identification of US-centric cultural traits. We offer guidelines to further extend the dataset to other languages and cultural environments.

2021

pdf bib abs

Reviewing Natural Language Processing Research
Kevin Cohen | Karën Fort | Margot Mieskes | Aurélie Névéol | Anna Rogers
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

The reviewing procedure has been identified as one of the major issues in the current situation of the NLP field. While it is implicitly assumed that junior researcher learn reviewing during their PhD project, this might not always be the case. Additionally, with the growing NLP community and the efforts in the context of widening the NLP community, researchers joining the field might not have the opportunity to practise reviewing. This tutorial fills in this gap by providing an opportunity to learn the basics of reviewing. Also more experienced researchers might find this tutorial interesting to revise their reviewing procedure.

pdf bib

Corpus-based language universals analysis using Universal Dependencies
Hee-Soo Choi | Bruno Guillaume | Karën Fort
Proceedings of the Second Workshop on Quantitative Syntax (Quasy, SyntaxFest 2021)

pdf bib abs

Investigating Dominant Word Order on Universal Dependencies with Graph Rewriting
Hee-Soo Choi | Bruno Guillaume | Karën Fort | Guy Perrier
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper details experiments we performed on the Universal Dependencies 2.7 corpora in order to investigate the dominant word order in the available languages. For this purpose, we used a graph rewriting tool, GREW, which allowed us to go beyond the surface annotations and identify the implicit subjects. We first measured the distribution of the six different word orders (SVO, SOV, VSO, VOS, OVS, OSV) in the corpora and investigated when there was a significant difference in the corpora within a given language. Then, we compared the obtained results with information provided in the WALS database (Dryer and Haspelmath, 2013) and in ( ̈Ostling, 2015). Finally, we examined the impact of using a graph rewriting tool for this task. The tools and resources used for this research are all freely available.

2020

pdf bib abs

Répliquer et étendre pour l’alsacien “Étiquetage en parties du discours de langues peu dotées par spécialisation des plongements lexicaux” (Replicating and extending for Alsatian : “POS tagging for low-resource languages by adapting word embeddings”)
Alice Millour | Karën Fort | Pierre Magistry
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). 2e atelier Éthique et TRaitemeNt Automatique des Langues (ETeRNAL)

Nous présentons ici les résultats d’un travail de réplication et d’extension pour l’alsacien d’une expérience concernant l’étiquetage en parties du discours de langues peu dotées par spécialisation des plongements lexicaux (Magistry et al., 2018). Ce travail a été réalisé en étroite collaboration avec les auteurs de l’article d’origine. Cette interaction riche nous a permis de mettre au jour les éléments manquants dans la présentation de l’expérience, de les compléter, et d’étendre la recherche à la robustesse à la variation.

pdf bib

Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). 2e atelier Éthique et TRaitemeNt Automatique des Langues (ETeRNAL)
Gilles Adda | Maxime Amblard | Karën Fort
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). 2e atelier Éthique et TRaitemeNt Automatique des Langues (ETeRNAL)

pdf bib abs

Rigor Mortis: Annotating MWEs with a Gamified Platform
Karën Fort | Bruno Guillaume | Yann-Alan Pilatte | Mathieu Constant | Nicolas Lefèbvre
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present here Rigor Mortis, a gamified crowdsourcing platform designed to evaluate the intuition of the speakers, then train them to annotate multi-word expressions (MWEs) in French corpora. We previously showed that the speakers’ intuition is reasonably good (65% in recall on non-fixed MWE). We detail here the annotation results, after a training phase using some of the tests developed in the PARSEME-FR project.

pdf bib abs

Reviewing Natural Language Processing Research
Kevin Cohen | Karën Fort | Margot Mieskes | Aurélie Névéol
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial will cover the theory and practice of reviewing research in natural language processing. Heavy reviewing burdens on natural language processing researchers have made it clear that our community needs to increase the size of our pool of potential reviewers. Simultaneously, notable “false negatives”—rejection by our conferences of work that was later shown to be tremendously important after acceptance by other conferences—have raised awareness of the fact that our reviewing practices leave something to be desired. We do not often talk about “false positives” with respect to conference papers, but leaders in the field have noted that we seem to have a publication bias towards papers that report high performance, with perhaps not much else of interest in them. It need not be this way. Reviewing is a learnable skill, and you will learn it here via lectures and a considerable amount of hands-on practice.

pdf bib abs

Text Corpora and the Challenge of Newly Written Languages
Alice Millour | Karën Fort
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Text corpora represent the foundation on which most natural language processing systems rely. However, for many languages, collecting or building a text corpus of a sufficient size still remains a complex issue, especially for corpora that are accessible and distributed under a clear license allowing modification (such as annotation) and further resharing. In this paper, we review the sources of text corpora usually called upon to fill the gap in low-resource contexts, and how crowdsourcing has been used to build linguistic resources. Then, we present our own experiments with crowdsourcing text corpora and an analysis of the obstacles we encountered. Although the results obtained in terms of participation are still unsatisfactory, we advocate that the effort towards a greater involvement of the speakers should be pursued, especially when the language of interest is newly written.

We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of this generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.

2019

pdf bib abs

Community Perspective on Replicability in Natural Language Processing
Margot Mieskes | Karën Fort | Aurélie Névéol | Cyril Grouin | Kevin Cohen
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

With recent efforts in drawing attention to the task of replicating and/or reproducing results, for example in the context of COLING 2018 and various LREC workshops, the question arises how the NLP community views the topic of replicability in general. Using a survey, in which we involve members of the NLP community, we investigate how our community perceives this topic, its relevance and options for improvement. Based on over two hundred participants, the survey results confirm earlier observations, that successful reproducibility requires more than having access to code and data. Additionally, the results show that the topic has to be tackled from the authors’, reviewers’ and community’s side.

pdf bib abs

Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling
Alice Millour | Karën Fort
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Building representative linguistic resources and NLP tools for non-standardized languages is challenging: when spelling is not determined by a norm, multiple written forms can be encountered for a given word, inducing a large proportion of out-of-vocabulary words. To embrace this diversity, we propose a methodology based on crowdsourced alternative spellings we use to extract rules applied to match OOV words with one of their spelling variants. This virtuous process enables the unsupervised augmentation of multi-variant lexicons without expert rule definition. We apply this multilingual methodology on Alsatian, a French regional language and provide an intrinsic evaluation of the correctness of the variants pairs, and an extrinsic evaluation on a downstream task. We show that in a low-resource scenario, 145 inital pairs can lead to the generation of 876 additional variant pairs, and a diminution of OOV words improving the part-of-speech tagging performance by 1 to 4%.

2018

pdf bib abs

“Fingers in the Nose”: Evaluating Speakers’ Identification of Multi-Word Expressions Using a Slightly Gamified Crowdsourcing Platform
Karën Fort | Bruno Guillaume | Matthieu Constant | Nicolas Lefèbvre | Yann-Alan Pilatte
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This article presents the results we obtained in crowdsourcing French speakers’ intuition concerning multi-work expressions (MWEs). We developed a slightly gamified crowdsourcing platform, part of which is designed to test users’ ability to identify MWEs with no prior training. The participants perform relatively well at the task, with a recall reaching 65% for MWEs that do not behave as function words.

pdf bib

Toward a Lightweight Solution for Less-resourced Languages: Creating a POS Tagger for Alsatian Using Voluntary Crowdsourcing
Alice Millour | Karën Fort
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib

À l’écoute des locuteurs : production participative de ressources langagières pour des langues non standardisées [Listening to speakers: crowdsourcing of language resources for non-standardized languages]
Alice Millour | Karën Fort
Traitement Automatique des Langues, Volume 59, Numéro 3 : Traitement automatique des langues peu dotées [NLP for Under-Resourced Languages]

2017

pdf bib abs

Vers l’annotation par le jeu de corpus (plus) complexes : le cas de la langue de spécialité (Towards (more) complex corpora annotation using a game with a purpose : the case of scientific language)
Karën Fort | Bruno Guillaume | Nicolas Lefebvre | Laura Ramírez | Mathilde Regnault | Mary Collins | Oksana Gavrilova | Tanti Kristanti
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Nous avons précédemment montré qu’il est possible de faire produire des annotations syntaxiques de qualité par des participants à un jeu ayant un but. Nous présentons ici les résultats d’une expérience visant à évaluer leur production sur un corpus plus complexe, en langue de spécialité, en l’occurrence un corpus de textes scientifiques sur l’ADN. Nous déterminons précisément la complexité de ce corpus, puis nous évaluons les annotations en syntaxe de dépendances produites par les joueurs par rapport à une référence mise au point par des experts du domaine.

pdf bib abs

Vers une solution légère de production de données pour le TAL : création d’un tagger de l’alsacien par crowdsourcing bénévole (Toward a lightweight solution to the language resources bottleneck issue: creating a POS tagger for Alsatian using voluntary crowdsourcing)
Alice Millour | Karën Fort | Delphine Bernhard | Lucie Steiblé
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

Nous présentons ici les résultats d’une expérience menée sur l’annotation en parties du discours d’un corpus d’une langue régionale encore peu dotée, l’alsacien, via une plateforme de myriadisation (crowdsourcing) bénévole développée spécifiquement à cette fin : Bisame1 . La plateforme, mise en ligne en mai 2016, nous a permis de recueillir 15 846 annotations grâce à 42 participants. L’évaluation des annotations, réalisée sur un corpus de référence, montre que la F-mesure des annotations volontaires est de 0, 93. Le tagger entraîné sur le corpus annoté atteint lui 82 % d’exactitude. Il s’agit du premier tagger spécifique à l’alsacien. Cette méthode de développement de ressources langagières est donc efficace et prometteuse pour certaines langues peu dotées, dont un nombre suffisant de locuteurs est connecté et actif sur le Web. Le code de la plateforme, le corpus annoté et le tagger sont librement disponibles.

2016

pdf bib

Éthique et traitement automatique des langues et de la parole : entre truismes et tabous [Ethics and natural language and speech processing: between truisms and taboos]
Karën Fort | Gilles Adda | Kevin Bretonnel Cohen
Traitement Automatique des Langues, Volume 57, Numéro 2 : TAL et éthique [NLP and ethics]

pdf bib abs

Yes, We Care! Results of the Ethics and Natural Language Processing Surveys
Karën Fort | Alain Couillault
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present here the context and results of two surveys (a French one and an international one) concerning Ethics and NLP, which we designed and conducted between June and September 2015. These surveys follow other actions related to raising concern for ethics in our community, including a Journée d’études, a workshop and the Ethics and Big Data Charter. The concern for ethics shows to be quite similar in both surveys, despite a few differences which we present and discuss. The surveys also lead to think there is a growing awareness in the field concerning ethical issues, which translates into a willingness to get involved in ethics-related actions, to debate about the topic and to see ethics be included in major conferences themes. We finally discuss the limits of the surveys and the means of action we consider for the future. The raw data from the two surveys are freely available online.

pdf bib abs

Crowdsourcing Complex Language Resources: Playing to Annotate Dependency Syntax
Bruno Guillaume | Karën Fort | Nicolas Lefebvre
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This article presents the results we obtained on a complex annotation task (that of dependency syntax) using a specifically designed Game with a Purpose, ZombiLingo. We show that with suitable mechanisms (decomposition of the task, training of the players and regular control of the annotation quality during the game), it is possible to obtain annotations whose quality is significantly higher than that obtainable with a parser, provided that enough players participate. The source code of the game and the resulting annotated corpora (for French) are freely available.

2014

pdf bib abs

Propa-L: a semantic filtering service from a lexical network created using Games With A Purpose
Mathieu Lafourcade | Karën Fort
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article presents Propa-L, a freely accessible Web service that allows to semantically filter a lexical network. The language resources behind the service are dynamic and created through Games With A Purpose. We show an example of application of this service: the generation of a list of keywords for parental filtering on the Web, but many others can be envisaged. Moreover, the propagation algorithm we present here can be applied to any lexical network, in any language.

pdf bib abs

Evaluating corpora documentation with regards to the Ethics and Big Data Charter
Alain Couillault | Karën Fort | Gilles Adda | Hugues de Mazancourt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The authors have written the Ethic and Big Data Charter in collaboration with various agencies, private bodies and associations. This Charter aims at describing any large or complex resources, and in particular language resources, from a legal and ethical viewpoint and ensuring the transparency of the process of creating and distributing such resources. We propose in this article an analysis of the documentation coverage of the most frequently mentioned language resources with regards to the Charter, in order to show the benefit it offers.

pdf bib

Analyse lexicale outillée de la parole transcrite de patients schizophrènes [Machine-assisted lexical analysis of speech transcripts from schizophrenic patients]
Maxime Amblard | Karën Fort | Caroline Demily | Nicolas Franck | Michel Musiol
Traitement Automatique des Langues, Volume 55, Numéro 3 : Traitement automatique du langage naturel et sciences cognitives [Natural Language Processing and Cognitive Sciences]

pdf bib abs

We define a deep syntactic representation scheme for French, which abstracts away from surface syntactic variation and diathesis alternations, and describe the annotation of deep syntactic representations on top of the surface dependency trees of the Sequoia corpus. The resulting deep-annotated corpus, named deep-sequoia, is freely available, and hopefully useful for corpus linguistics studies and for training deep analyzers to prepare semantic analysis.

pdf bib

Quantitative study of disfluencies in schizophrenics’ speech: Automatize to limit biases (Étude quantitative des disfluences dans le discours de schizophrènes : automatiser pour limiter les biais) [in French]
Maxime Amblard | Karën Fort
Proceedings of TALN 2014 (Volume 1: Long Papers)

pdf bib

ZOMBILINGO: eating heads to perform dependency syntax annotation (ZOMBILINGO : manger des têtes pour annoter en syntaxe de dépendances) [in French]
Karën Fort | Bruno Guillaume | Valentin Stern
Proceedings of TALN 2014 (Volume 3: System Demonstrations)

pdf bib abs

Mapping the Lexique des Verbes du Français (Lexicon of French Verbs) to a NLP lexicon using examples
Bruno Guillaume | Karën Fort | Guy Perrier | Paul Bédaride
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article presents experiments aiming at mapping the Lexique des Verbes du Français (Lexicon of French Verbs) to FRILEX, a Natural Language Processing (NLP) lexicon based on D ICOVALENCE. The two resources (Lexicon of French Verbs and D ICOVALENCE) were built by linguists, based on very different theories, which makes a direct mapping nearly impossible. We chose to use the examples provided in one of the resource to find implicit links between the two and make them explicit.

pdf bib

Annotation scheme for deep dependency syntax of French (Un schéma d’annotation en dépendances syntaxiques profondes pour le français) [in French]
Guy Perrier | Marie Candito | Bruno Guillaume | Corentin Ribeyre | Karën Fort | Djamé Seddah
Proceedings of TALN 2014 (Volume 2: Short Papers)

2013

pdf bib

Formalizing an annotation guide : some experiments towards assisted agile annotation (Expériences de formalisation d’un guide d’annotation : vers l’annotation agile assistée) [in French]
Bruno Guillaume | Karën Fort
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf bib

Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers
Sophie Rosset | Cyril Grouin | Karën Fort | Olivier Galibert | Juliette Kahn | Pierre Zweigenbaum
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib

Modeling the Complexity of Manual Annotation Tasks: a Grid of Analysis
Karën Fort | Adeline Nazarenko | Sophie Rosset
Proceedings of COLING 2012

pdf bib abs

Analyzing the Impact of Prevalence on the Evaluation of a Manual Annotation Campaign
Karën Fort | Claire François | Olivier Galibert | Maha Ghribi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This article details work aiming at evaluating the quality of the manual annotation of gene renaming couples in scientific abstracts, which generates sparse annotations. To evaluate these annotations, we compare the results obtained using the commonly advocated inter-annotator agreement coefficients such as S, κ and Ï, the less known R, the weighted coefficients κÏ and Î± as well as the F-measure and the SER. We analyze to which extent they are relevant for our data. We then study the bias introduced by prevalence by changing the way the contingency table is built. We finally propose an original way to synthesize the results by computing distances between categories, based on the produced annotations.

pdf bib

TCOF-POS : un corpus libre de français parlé annoté en morphosyntaxe (TCOF-POS : A Freely Available POS-Tagged Corpus of Spoken French) [in French]
Christophe Benzitoun | Karën Fort | Benoît Sagot
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib

Annotation manuelle de matchs de foot : Oh la la la ! l’accord inter-annotateurs ! et c’est le but ! (Manual Annotation of Football Matches : Inter-annotator Agreement ! Gooooal !) [in French]
Karën Fort | Vincent Claveau
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib abs

Annotating Football Matches: Influence of the Source Medium on Manual Annotation
Karën Fort | Vincent Claveau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we present an annotation campaign of football (soccer) matches, from a heterogeneous text corpus of both match minutes and video commentary transcripts, in French. The data, annotations and evaluation process are detailed, and the quality of the annotated corpus is discussed. In particular, we propose a new technique to better estimate the annotator agreement when few elements of a text are to be annotated. Based on that, we show how the source medium influenced the process and the quality.

pdf bib

2011

pdf bib abs

Un turc mécanique pour les ressources linguistiques : critique de la myriadisation du travail parcellisé (Mechanical Turk for linguistic resources: review of the crowdsourcing of parceled work)
Benoît Sagot | Karën Fort | Gilles Adda | Joseph Mariani | Bernard Lang
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article est une prise de position concernant les plate-formes de type Amazon Mechanical Turk, dont l’utilisation est en plein essor depuis quelques années dans le traitement automatique des langues. Ces plateformes de travail en ligne permettent, selon le discours qui prévaut dans les articles du domaine, de faire développer toutes sortes de ressources linguistiques de qualité, pour un prix imbattable et en un temps très réduit, par des gens pour qui il s’agit d’un passe-temps. Nous allons ici démontrer que la situation est loin d’être aussi idéale, que ce soit sur le plan de la qualité, du prix, du statut des travailleurs ou de l’éthique. Nous rappellerons ensuite les solutions alternatives déjà existantes ou proposées. Notre but est ici double : informer les chercheurs, afin qu’ils fassent leur choix en toute connaissance de cause, et proposer des solutions pratiques et organisationnelles pour améliorer le développement de nouvelles ressources linguistiques en limitant les risques de dérives éthiques et légales, sans que cela se fasse au prix de leur coût ou de leur qualité.

pdf bib

pdf bib

pdf bib

Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine?
Karën Fort | Gilles Adda | K. Bretonnel Cohen
Computational Linguistics, Volume 37, Issue 2 - June 2011

2010

pdf bib abs

FastKwic, an “Intelligent“ Concordancer Using FASTR
Véronika Lux-Pogodalla | Dominique Besagni | Karën Fort
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we introduce the FastKwic (Key Word In Context using FASTR), a new concordancer for French and English that does not require users to learn any particular request language. Built on FASTR, it shows them not only occurrences of the searched term but also of several morphological, morpho-syntactic and syntactic variants (for example, image enhancement, enhancement of image, enhancement of fingerprint image, image texture enhancement). Fastkwic is freely available. It consists of two UTF-8 compliant Perl modules that depend on several external tools and resources : FASTR, TreeTagger, Flemm (for French). Licenses of theses tools and resources permitting, the FastKwic package is nevertheless self-sufficient. FastKwic first modules is for terminological resource compilation. Its input is a list of terms - as required by FASTR. FastKwic second module is for processing concordances. It relies on FASTR again for indexing the input corpus with terms and their variants. Its output is a concordancer: for each term and its variants, the context of occurrence is provided.

pdf bib

Influence of Pre-Annotation on POS-Tagged Corpus Development
Karën Fort | Benoît Sagot
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib abs

Évaluer des annotations manuelles dispersées : les coefficients sont-ils suffisants pour estimer l’accord inter-annotateurs ?
Karën Fort | Claire François | Maha Ghribi
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’objectif des travaux présentés dans cet article est l’évaluation de la qualité d’annotations manuelles de relations de renommage de gènes dans des résumés scientifiques, annotations qui présentent la caractéristique d’être très dispersées. Pour cela, nous avons calculé et comparé les coefficients les plus communément utilisés, entre autres kappa (Cohen, 1960) et pi (Scott, 1955), et avons analysé dans quelle mesure ils sont adaptés à nos données. Nous avons également étudié les différentes pondérations applicables à ces coefficients permettant de calculer le kappa pondéré (Cohen, 1968) et l’alpha (Krippendorff, 1980, 2004). Nous avons ainsi étudié le biais induit par la grande prévalence d’une catégorie et défini un mode de calcul des distances entre catégories reposant sur les annotations réalisées.

2009

pdf bib abs

Vers une méthodologie d’annotation des entités nommées en corpus ?
Karën Fort | Maud Ehrmann | Adeline Nazarenko
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La tâche, aujourd’hui considérée comme fondamentale, de reconnaissance d’entités nommées, présente des difficultés spécifiques en matière d’annotation. Nous les précisons ici, en les illustrant par des expériences d’annotation manuelle dans le domaine de la microbiologie. Ces problèmes nous amènent à reposer la question fondamentale de ce que les annotateurs doivent annoter et surtout, pour quoi faire. Nous identifions pour cela les applications nécessitant l’extraction d’entités nommées et, en fonction des besoins de ces applications, nous proposons de définir sémantiquement les éléments à annoter. Nous présentons ensuite un certain nombre de recommandations méthodologiques permettant d’assurer un cadre d’annotation cohérent et évaluable.

pdf bib

Towards a Methodology for Named Entities Annotation
Karën Fort | Maud Ehrmann | Adeline Nazarenko
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2008

pdf bib abs

Sylva : plate-forme de validation multi-niveaux de lexiques
Karën Fort | Bruno Guillaume
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

La production de lexiques est une activité indispensable mais complexe, qui nécessite, quelle que soit la méthode de création utilisée (acquisition automatique ou manuelle), une validation humaine. Nous proposons dans ce but une plate-forme Web librement disponible, appelée Sylva (Systematic lexicon validator). Cette plate-forme a pour caractéristiques principales de permettre une validation multi-niveaux (par des validateurs, puis un expert) et une traçabilité de la ressource. La tâche de l’expert(e) linguiste en est allégée puisqu’il ne lui reste à considérer que les données sur lesquelles il n’y a pas d’accord inter-validateurs.

pdf bib

2007

pdf bib abs

PrepLex : un lexique des prépositions du français pour l’analyse syntaxique
Karën Fort | Bruno Guillaume
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

PrepLex est un lexique des prépositions du français. Il contient les informations utiles à des systèmes d’analyse syntaxique. Il a été construit en comparant puis fusionnant différentes sources d’informations lexicales disponibles. Ce lexique met également en évidence les prépositions ou classes de prépositions qui apparaissent dans la définition des cadres de sous-catégorisation des ressources lexicales qui décrivent la valence des verbes.

pdf bib

PrepLex: A Lexicon of French Prepositions for Parsing
Karën Fort | Bruno Guillaume
Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions