Patrick Paroubek

Also published as: P. Paroubek


2024

pdf bib
A Few-shot Approach to Task-oriented Dialogue Enhanced with Chitchat
Armand Stricker | Patrick Paroubek
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Large language models (LLMs) tuned for chat have recently been adopted for few-shot end-to-end task-oriented dialogue (TOD), with some success. To further assess this method, we conduct experiments on two, more complex, task-oriented benchmarks that integrate elements of chitchat into the conversation. We enhance a few-shot baseline by adding zero-shot chitchat detection and implementing function calling for dialogue state tracking (DST). We focus on this step in the task-oriented pipeline as it comes first, and errors due to added chitchat at this stage have the most impact on end-to-end performance. We find that this prompting method shows increased resilience to mixed-mode inputs and our enhanced pipeline allows for natural inter-mode conversations, as assessed through human evaluation. Our findings also suggest that the performance gap between few-shot prompting for TOD and supervised task-specific models is narrowing.

pdf bib
CQuAE : Un nouveau corpus de question-réponse pour l’enseignement
Thomas Gerald | Louis Tamames | Sofiane Ettayeb | Patrick Paroubek | Anne Vilnat
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Dans cet article nous présentons un nouveau corpus de question-réponse en français pour le domaine de l’éducation. Ce corpus à été construit dans le but de créer un système d’assistant virtuel pour répondre à des questions sur des documents ou du matériel de cours. Afin d’être utile autant aux enseignants qu’au étudiants, il est important de considérer des questions complexes ainsi que d’être capable de justifier les réponses sur du matériel validé. Nous présentons donc le nouveau Corpus CQuAE, un corpus de question-réponse manuellement annoté dont nous discutons des propriétés. Nous présenterons aussi les différentes étapes de sa création avec aujourd’hui une phase d’amélioration des données.Enfin, nous présentons plusieurs expériences pour évaluer l’exploitation du corpus dans le cadre d’un système de question-réponse automatique.Ces différentes analyses et expériences nous permettrons de valider l’adéquation des données collectés pour l’objectif visé.

pdf bib
Pre-training data selection for biomedical domain adaptation using journal impact metrics
Mathieu Lai-king | Patrick Paroubek
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing

Domain adaptation is a widely used method in natural language processing (NLP) to improve the performance of a language model within a specific domain. This method is particularly common in the biomedical domain, which sees regular publication of numerous scientific articles. PubMed, a significant corpus of text, is frequently used in the biomedical domain. The primary objective of this study is to explore whether refining a pre-training dataset using specific quality metrics for scientific papers can enhance the performance of the resulting model. To accomplish this, we employ two straightforward journal impact metrics and conduct experiments by continually pre-training BERT on various subsets of the complete PubMed training set, we then evaluate the resulting models on biomedical language understanding tasks from the BLURB benchmark. Our results show that pruning using journal impact metrics is not efficient. But we also show that pre-training using fewer abstracts (but with the same number of training steps) does not necessarily decrease the resulting model’s performance.

pdf bib
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Lisa Raithel | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Thomas Lavergne | Aurélie Névéol | Patrick Paroubek | Philippe Thomas | Tomohiro Nishiyama | Sebastian Möller | Eiji Aramaki | Yuji Matsumoto | Roland Roller | Pierre Zweigenbaum
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.

pdf bib
Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues
Armand Stricker | Patrick Paroubek
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

During task-oriented dialogues (TODs), human users naturally introduce chitchat that is beyond the immediate scope of the task, interfering with the flow of the conversation. To address this issue without the need for expensive manual data creation, we use few-shot prompting with Llama-2-70B to enhance the MultiWOZ dataset with user backstories, a typical example of chitchat interference in TODs. We assess the impact of this addition by testing two models: one trained solely on TODs and another trained on TODs with a preliminary chitchat interaction. Our analysis demonstrates that our enhanced dataset poses a challenge for these systems. Moreover, we demonstrate that our dataset can be effectively used for training purposes, enabling a system to consistently acknowledge the user’s backstory while also successfully moving the task forward in the same turn, as confirmed by human evaluation. These findings highlight the benefits of generating novel chitchat-TOD scenarios to test TOD systems more thoroughly and improve their resilience to natural user interferences.

pdf bib
Evaluating Topic Model on Asymmetric and Multi-Domain Financial Corpus
Corentin Masson | Patrick Paroubek
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Multiple recent research works in Finance try to quantify the exposure of market assets to various risks from text and how assets react if the risk materialize itself. We consider risk sections from french Financial Corporate Annual Reports, which are regulated documents with a mandatory section containing important risks the company is facing, to extract an accurate risk profile and exposure of companies. We identify multiple pitfalls of topic models when applied to corporate filing financial domain data for unsupervised risk distribution extraction which has not yet been studied on this domain. We propose two new metrics to evaluate the behavior of different types of topic models with respect to pitfalls previously mentioned about document risk distribution extraction. Our evaluation will focus on three aspects: regularizations, down-sampling and data augmentation. In our experiments, we found that classic Topic Models require down-sampling to obtain unbiased risks, while Topic Models using metadata and in-domain pre-trained word-embeddings partially correct the coherence imbalance per subdomain and remove sector’s specific language from the detected themes. We then demonstrate the relevance and usefulness of the extracted information with visualizations that help to understand the content of such corpus and its evolution along the years.

pdf bib
Introducing CQuAE : A New French Contextualised Question-Answering Corpus for the Education Domain
Thomas Gerald | Anne Vilnat | Sofiane Ettayeb | Louis Tamames | Patrick Paroubek
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present a new question answering corpus in French designed to educational domain. To be useful in such domain, we have to propose more complex questions and to be able to justify the answers on validated material. We analyze some properties of this corpus. The last part of this paper will be devoted to present the first experiments we have carried out to demonstrate the value of this dataset for learning a Retrieval Augmented Genration framework. Different experiments are proposed, with an automatic evaluation. A human evaluation is proposed to confirm or infirm this automatic evaluation.

2023

pdf bib
Emotion Recognition based on Psychological Components in Guided Narratives for Emotion Regulation
Gustave Cortal | Alain Finkel | Patrick Paroubek | Lina Ye
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Emotion regulation is a crucial element in dealing with emotional events and has positive effects on mental health. This paper aims to provide a more comprehensive understanding of emotional events by introducing a new French corpus of emotional narratives collected using a questionnaire for emotion regulation. We follow the theoretical framework of the Component Process Model which considers emotions as dynamic processes composed of four interrelated components (behavior, feeling, thinking and territory). Each narrative is related to a discrete emotion and is structured based on all emotion components by the writers. We study the interaction of components and their impact on emotion classification with machine learning methods and pre-trained language models. Our results show that each component improves prediction performance, and that the best results are achieved by jointly considering all components. Our results also show the effectiveness of pre-trained language models in predicting discrete emotion from certain components, which reveal differences in how emotion components are expressed.

2022

pdf bib
DIASER: A Unifying View On Task-oriented Dialogue Annotation
Vojtěch Hudeček | Léon-Paul Schaub | Daniel Stancl | Patrick Paroubek | Ondřej Dušek
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Every model is only as strong as the data that it is trained on. In this paper, we present a new dataset, obtained by merging four publicly available annotated corpora for task-oriented dialogues in several domains (MultiWOZ 2.2, CamRest676, DSTC2 and Schema-Guided Dialogue Dataset). This way, we assess the feasibility of providing a unified ontology and annotation schema covering several domains with a relatively limited effort. We analyze the characteristics of the resulting dataset along three main dimensions: language, information content and performance. We focus on aspects likely to be pertinent for improving dialogue success, e.g. dialogue consistency. Furthermore, to assess the usability of this new corpus, we thoroughly evaluate dialogue generation performance under various conditions with the help of two prominent recent end-to-end dialogue models: MarCo and GPT-2. These models were selected as popular open implementations representative of the two main dimensions of dialogue modelling. While we did not observe a significant gain for dialogue state tracking performance, we show that using more training data from different sources can improve language modelling capabilities and positively impact dialogue flow (consistency). In addition, we provide the community with one of the largest open dataset for machine learning experiments.

pdf bib
Un corpus annoté pour la génération de questions et l’extraction de réponses pour l’enseignement (An annotated corpus for abstractive question generation and extractive answer for education)
Thomas Gerald | Sofiane Ettayeb | Ha Quang Le | Anne Vilnat | Gabriel Illouz | Patrick Paroubek
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 : Démonstrations

Dans cette démonstration, nous présenterons les travaux en cours pour l’annotation d’un nouveau corpus de questions-réponses en langue Française. Contrairement aux corpus existant comme “FQuad” ou “Piaf”, nous nous intéressons à l’annotation de questions-réponses “non factuelles”. En effet, si dans la littérature, de nombreux corpus et modèles de questions-réponses pré-entraînés sont disponibles, ceux-ci ne privilégient que rarement les annotations s’appuyant sur un schéma de raisonnement issue de l’agrégation de différentes sources ou contextes. L’objectif du projet associé est de parvenir à la création d’un assistant virtuel pour l’éducation, ainsi des réponses explicatives, de raisonnement et/ou d’agrégation de l’information sont à privilégier. Notons enfin, que la volumétrie des données doit être conséquente, en particulier par la considération d’approches neuronales génératives ou extractives. Actuellement, nous disposons de 262 questions et réponses obtenues durant l’étape de validation de la campagne d’annotation. Une deuxième phase d’annotation avec une volumétrie plus importante débutera fin mai 2022 (environ 8000 questions).

pdf bib
MAPA Project: Ready-to-Go Open-Source Datasets and Deep Learning Technology to Remove Identifying Information from Text Documents
Victoria Arranz | Khalid Choukri | Montse Cuadros | Aitor García Pablos | Lucie Gianola | Cyril Grouin | Manuel Herranz | Patrick Paroubek | Pierre Zweigenbaum
Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference

2021

pdf bib
A Fine-Grained Annotated Corpus for Target-Based Opinion Analysis of Economic and Financial Narratives
Jiahui Hu | Patrick Paroubek
Proceedings of the Third Workshop on Economics and Natural Language Processing

In this paper about aspect-based sentiment analysis (ABSA), we present the first version of a fine-grained annotated corpus for target-based opinion analysis (TBOA) to analyze economic activities or financial markets. We have annotated, at an intra-sentential level, a corpus of sentences extracted from documents representative of financial analysts’ most-read materials by considering how financial actors communicate about the evolution of event trends and analyze related publications (news, official communications, etc.). Since we focus on identifying the expressions of opinions related to the economy and financial markets, we annotated the sentences that contain at least one subjective expression about a domain-specific term. Candidate sentences for annotations were randomly chosen from texts of specialized press and professional information channels over a period ranging from 1986 to 2021. Our annotation scheme relies on various linguistic markers like domain-specific vocabulary, syntactic structures, and rhetorical relations to explicitly describe the author’s subjective stance. We investigated and evaluated the recourse to automatic pre-annotation with existing natural language processing technologies to alleviate the annotation workload. Our aim is to propose a corpus usable on the one hand as training material for the automatic detection of the opinions expressed on an extensive range of domain-specific aspects and on the other hand as a gold standard for evaluation TBOA. In this paper, we present our pre-annotation models and evaluations of their performance, introduce our annotation scheme and report on the main characteristics of our corpus.

pdf bib
Differential Evaluation: a Qualitative Analysis of Natural Language Processing System Behavior Based Upon Data Resistance to Processing
Lucie Gianola | Hicham El Boukkouri | Cyril Grouin | Thomas Lavergne | Patrick Paroubek | Pierre Zweigenbaum
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Most of the time, when dealing with a particular Natural Language Processing task, systems are compared on the basis of global statistics such as recall, precision, F1-score, etc. While such scores provide a general idea of the behavior of these systems, they ignore a key piece of information that can be useful for assessing progress and discerning remaining challenges: the relative difficulty of test instances. To address this shortcoming, we introduce the notion of differential evaluation which effectively defines a pragmatic partition of instances into gradually more difficult bins by leveraging the predictions made by a set of systems. Comparing systems along these difficulty bins enables us to produce a finer-grained analysis of their relative merits, which we illustrate on two use-cases: a comparison of systems participating in a multi-label text classification task (CLEF eHealth 2018 ICD-10 coding), and a comparison of neural models trained for biomedical entity detection (BioCreative V chemical-disease relations dataset).

pdf bib
A sequence to sequence transformer data logic experiment
Danxin Cui | Dominique Mariko | Estelle Labidurie | Hugues de Mazancourt | Patrick Paroubek
Proceedings of the 3rd Financial Narrative Processing Workshop

pdf bib
Annotation model and corpus for opinionated economy and finance narrative detection
Jiahui Hu | Patrick Paroubek | Dirk Schumacher
Proceedings of the 3rd Financial Narrative Processing Workshop

pdf bib
Définition et détection des incohérences du système dans les dialogues orientés tâche. (We present experiments on automatically detecting inconsistent behavior of task-oriented dialogue systems from the context)
Léon-Paul Schaub | Vojtech Hudecek | Daniel Stancl | Ondrej Dusek | Patrick Paroubek
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Définition et détection des incohérences du système dans les dialogues orientés tâche. Nous présentons des expériences sur la détection automatique des comportements incohérents des systèmes de dialogues orientés tâche à partir du contexte. Nous enrichissons les données bAbI/DSTC2 (Bordes et al., 2017) avec une annotation automatique des incohérences de dialogue, et nous démontrons que les incohérences sont en corrélation avec les dialogues ratés. Nous supposons que l’utilisation d’un historique de dialogue limité et la prédiction du prochain tour de l’utilisateur peuvent améliorer la classification des incohérences. Si les deux hypothèses sont confirmées pour un modèle de dialogue basé sur les réseaux de mémoire, elles ne le sont pas pour un entraînement basé sur le modèle de langage GPT-2, qui bénéficie le plus de l’utilisation de l’historique complet du dialogue et obtient un score de précision de 0,99.

2020

pdf bib
NLP Analytics in Finance with DoRe: A French 250M Tokens Corpus of Corporate Annual Reports
Corentin Masson | Patrick Paroubek
Proceedings of the Twelfth Language Resources and Evaluation Conference

Recent advances in neural computing and word embeddings for semantic processing open many new applications areas which had been left unaddressed so far because of inadequate language understanding capacity. But this new kind of approaches rely even more on training data to be operational. Corpora for financial applications exists, but most of them concern stock market prediction and are in English. To address this need for the French language and regulation oriented applications which require a deeper understanding of the text content, we hereby present “DoRe”, a French and dialectal French Corpus for NLP analytics in Finance, Regulation and Investment. This corpus is composed of: (a) 1769 Annual Reports from 336 companies among the most capitalized companies in: France (Euronext Paris) & Belgium (Euronext Brussels), covering a time frame from 2009 to 2019, and (b) related MetaData containing information for each company about its ISIN code, capitalization and sector. This corpus is designed to be as modular as possible in order to allow for maximum reuse in different tasks pertaining to Economics, Finance and Regulation. After presenting existing resources, we relate the construction of the DoRe corpus and the rationale behind our choices, concluding on the spectrum of possible uses of this new resource for NLP applications.

pdf bib
DeSpin: a prototype system for detecting spin in biomedical publications
Anna Koroleva | Sanjay Kamath | Patrick Bossuyt | Patrick Paroubek
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Improving the quality of medical research reporting is crucial to reduce avoidable waste in research and to improve the quality of health care. Despite various initiatives aiming at improving research reporting – guidelines, checklists, authoring aids, peer review procedures, etc. – overinterpretation of research results, also known as spin, is still a serious issue in research reporting. In this paper, we propose a Natural Language Processing (NLP) system for detecting several types of spin in biomedical articles reporting randomized controlled trials (RCTs). We use a combination of rule-based and machine learning approaches to extract important information on trial design and to detect potential spin. The proposed spin detection system includes algorithms for text structure analysis, sentence classification, entity and relation extraction, semantic similarity assessment. Our algorithms achieved operational performance for the these tasks, F-measure ranging from 79,42 to 97.86% for different tasks. The most difficult task is extracting reported outcomes. Our tool is intended to be used as a semi-automated aid tool for assisting both authors and peer reviewers to detect potential spin. The tool incorporates a simple interface that allows to run the algorithms and visualize their output. It can also be used for manual annotation and correction of the errors in the outputs. The proposed tool is the first tool for spin detection. The tool and the annotated dataset are freely available.

pdf bib
The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project
Ēriks Ajausks | Victoria Arranz | Laurent Bié | Aleix Cerdà-i-Cucó | Khalid Choukri | Montse Cuadros | Hans Degroote | Amando Estela | Thierry Etchegoyhen | Mercedes García-Martínez | Aitor García-Pablos | Manuel Herranz | Alejandro Kohan | Maite Melero | Mike Rosner | Roberts Rozis | Patrick Paroubek | Artūrs Vasiļevskis | Pierre Zweigenbaum
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages. It will be developed since January 2020 until December 2021.

2019

pdf bib
Extracting relations between outcomes and significance levels in Randomized Controlled Trials (RCTs) publications
Anna Koroleva | Patrick Paroubek
Proceedings of the 18th BioNLP Workshop and Shared Task

Randomized controlled trials assess the effects of an experimental intervention by comparing it to a control intervention with regard to some variables - trial outcomes. Statistical hypothesis testing is used to test if the experimental intervention is superior to the control. Statistical significance is typically reported for the measured outcomes and is an important characteristic of the results. We propose a machine learning approach to automatically extract reported outcomes, significance levels and the relation between them. We annotated a corpus of 663 sentences with 2,552 outcome - significance level relations (1,372 positive and 1,180 negative relations). We compared several classifiers, using a manually crafted feature set, and a number of deep learning models. The best performance (F-measure of 94%) was shown by the BioBERT fine-tuned model.

2018

pdf bib
DEFT2018 : recherche d’information et analyse de sentiments dans des tweets concernant les transports en Île de France (DEFT2018 : Information Retrieval and Sentiment Analysis in Tweets about Public Transportation in Île de France Region )
Patrick Paroubek | Cyril Grouin | Patrice Bellot | Vincent Claveau | Iris Eshkol-Taravella | Amel Fraisse | Agata Jackiewicz | Jihen Karoui | Laura Monceaux | Juan-Manuel Torres-Moreno
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

Cet article présente l’édition 2018 de la campagne d’évaluation DEFT (Défi Fouille de Textes). A partir d’un corpus de tweets, quatre tâches ont été proposées : identifier les tweets sur la thématique des transports, puis parmi ces derniers, identifier la polarité (négatif, neutre, positif, mixte), identifier les marqueurs de sentiment et la cible, et enfin, annoter complètement chaque tweet en source et cible des sentiments exprimés. Douze équipes ont participé, majoritairement sur les deux premières tâches. Sur l’identification de la thématique des transports, la micro F-mesure varie de 0,827 à 0,908. Sur l’identification de la polarité globale, la micro F-mesure varie de 0,381 à 0,823.

pdf bib
Annotating Spin in Biomedical Scientific Publications : the case of Random Controlled Trials (RCTs)
Anna Koroleva | Patrick Paroubek
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Measuring Innovation in Speech and Language Processing Publications.
Joseph Mariani | Gil Francopoulo | Patrick Paroubek
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
AppFM, une plate-forme de gestion de modules de TAL (AppFM, a tool for managing NLP modules)
Paul Bui-Quang | Brigitte Grau | Patrick Paroubek
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

AppFM 1 est un outil à mi-chemin entre un environnement de création de chaînes modulaires de TAL et un gestionnaire de services systèmes. Il permet l’intégration d’applications ayant des dépendances complexes en des chaînes de traitements réutilisables facilement par le biais de multiples interfaces.

pdf bib
A Study of Reuse and Plagiarism in Speech and Natural Language Processing papers
Joseph Mariani | Gil Francopoulo | Patrick Paroubek
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

pdf bib
Providing and Analyzing NLP Terms for our Community
Gil Francopoulo | Joseph Mariani | Patrick Paroubek | Frédéric Vernier
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

By its own nature, the Natural Language Processing (NLP) community is a priori the best equipped to study the evolution of its own publications, but works in this direction are rare and only recently have we seen a few attempts at charting the field. In this paper, we use the algorithms, resources, standards, tools and common practices of the NLP field to build a list of terms characteristic of ongoing research, by mining a large corpus of scientific publications, aiming at the largest possible exhaustivity and covering the largest possible time span. Study of the evolution of this term list through time reveals interesting insights on the dynamics of field and the availability of the term database and of the corpus (for a large part) make possible many further comparative studies in addition to providing a test field for a new graphic interface designed to perform visual time analytics of large sized thesauri.

pdf bib
Predictive Modeling: Guessing the NLP Terms of Tomorrow
Gil Francopoulo | Joseph Mariani | Patrick Paroubek
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Predictive modeling, often called “predictive analytics” in a commercial context, encompasses a variety of statistical techniques that analyze historical and present facts to make predictions about unknown events. Often the unknown events are in the future, but prediction can be applied to any type of unknown whether it be in the past or future. In our case, we present some experiments applying predictive modeling to the usage of technical terms within the NLP domain.

pdf bib
A Study of Reuse and Plagiarism in LREC papers
Gil Francopoulo | Joseph Mariani | Patrick Paroubek
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The aim of this experiment is to present an easy way to compare fragments of texts in order to detect (supposed) results of copy & paste operations between articles in the domain of Natural Language Processing (NLP). The search space of the comparisons is a corpus labeled as NLP4NLP gathering a large part of the NLP field. The study is centered on LREC papers in both directions, first with an LREC paper borrowing a fragment of text from the collection, and secondly in the reverse direction with fragments of LREC documents borrowed and inserted in the collection.

2015

pdf bib
Utiliser les interjections pour détecter les émotions
Amel Fraisse | Patrick Paroubek
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Bien que les interjections soient un phénomène linguistique connu, elles ont été peu étudiées et cela continue d’être le cas pour les travaux sur les microblogs. Des travaux en analyse de sentiments ont montré l’intérêt des émoticônes et récemment des mots-dièses, qui s’avèrent être très utiles pour la classification en polarité. Mais malgré leur statut grammatical et leur richesse sémantique, les interjections sont restées marginalisées par les systèmes d’analyse de sentiments. Nous montrons dans cet article l’apport majeur des interjections pour la détection des émotions. Nous détaillons la production automatique, basée sur les interjections, d’un corpus étiqueté avec les émotions. Nous expliquons ensuite comment nous avons utilisé ce corpus pour en déduire, automatiquement, un lexique affectif pour le français. Ce lexique a été évalué sur une tâche de détection des émotions, qui a montré un gain en mesure F1 allant, selon les émotions, de +0,04 à +0,21.

2014

pdf bib
Toward a unifying model for Opinion, Sentiment and Emotion information extraction
Amel Fraisse | Patrick Paroubek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a logical formalization of a set 20 semantic categories related to opinion, emotion and sentiment. Our formalization is based on the BDI model (Belief, Desire and Intetion) and constitues a first step toward a unifying model for subjective information extraction. The separability of the subjective classes that we propose was assessed both formally and on two subjective reference corpora.

pdf bib
Rediscovering 15 Years of Discoveries in Language Resources and Evaluation: The LREC Anthology Analysis
Joseph Mariani | Patrick Paroubek | Gil Francopoulo | Olivier Hamon
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper aims at analyzing the content of the LREC conferences contained in the ELRA Anthology over the past 15 years (1998-2013). It follows similar exercises that have been conducted, such as the survey on the IEEE ICASSP conference series from 1976 to 1990, which served in the launching of the ESCA Eurospeech conference, a survey of the Association of Computational Linguistics (ACL) over 50 years of existence, which was presented at the ACL conference in 2012, or a survey over the 25 years (1987-2012) of the conferences contained in the ISCA Archive, presented at Interspeech 2013. It contains first an analysis of the evolution of the number of papers and authors over time, including the study of their gender, nationality and affiliation, and of the collaboration among authors. It then studies the funding sources of the research investigations that are reported in the papers. It conducts an analysis of the evolution of the research topics within the community over time. It finally looks at reuse and plagiarism in the papers. The survey shows the present trends in the conference series and in the Language Resources and Evaluation scientific community. Conducting this survey also demonstrated the importance of a clear and unique identification of authors, papers and other sources to facilitate the analysis. This survey is preliminary, as many other aspects also deserve attention. But we hope it will help better understanding and forging our community in the global village.

pdf bib
Facing the Identification Problem in Language-Related Scientific Data Analysis.
Joseph Mariani | Christopher Cieri | Gil Francopoulo | Patrick Paroubek | Marine Delaborde
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the problems that must be addressed when studying large amounts of data over time which require entity normalization applied not to the usual genres of news or political speech, but to the genre of academic discourse about language resources, technologies and sciences. It reports on the normalization processes that had to be applied to produce data usable for computing statistics in three past studies on the LRE Map, the ISCA Archive and the LDC Bibliography. It shows the need for human expertise during normalization and the necessity to adapt the work to the study objectives. It investigates possible improvements for reducing the workload necessary to produce comparable results. Through this paper, we show the necessity to define and agree on international persistent and unique identifiers.

pdf bib
Bidirectionnal converter between syntactic annotations : from French Treebank Dependencies to PASSAGE annotations, and back
Munshi Asadullah | Patrick Paroubek | Anne Vilnat
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present here part of a bidirectional converter between the French Tree-bank Dependency (FTB - DEP) annotations into the PASSAGE format. FTB - DEP is the representation used by several freely available parsers and the PASSAGE annotation was used to hand-annotate a relatively large sized corpus, used as gold-standard in the PASSAGE evaluation campaigns. Our converter will give the means to evaluate these parsers on the PASSAGE corpus. We shall illustrate the mapping of important syntactic phenomena using the corpus made of the examples of the FTB - DEP annotation guidelines, which we have hand-annotated with PASSAGE annotations and used to compute quantitative performance measures on the FTB - DEP guidelines.n this paper we will briefly introduce the two annotation formats. Then, we detail the two converters, and the rules which have been written. The last part will detail the results we obtained on the phenomenon we mostly study, the passive forms. We evaluate the converters by a double conversion, from PASSAGE to CoN LL and back to PASSAGE. We will detailed in this paper the linguistic phenomenon we detail here, the passive form.

pdf bib
Automatic Analysis of Scientific and Literary Texts. Presentation and Results of the DEFT2014 Text Mining Challenge (Analyse automatique de textes littéraires et scientifiques : présentation et résultats du défi fouille de texte DEFT2014) [in French]
Thierry Hamon | Quentin Pleplé | Patrick Paroubek | Pierre Zweigenbaum | Cyril Grouin
TALN-RECITAL 2014 Workshop DEFT 2014 : DÉfi Fouille de Textes (DEFT 2014 Workshop: Text Mining Challenge)

2013

pdf bib
Improving Minor Opinion Polarity Classification with Named Entity Analysis (L’apport des Entités Nommées pour la classification des opinions minoritaires) [in French]
Amel Fraisse | Patrick Paroubek | Gil Francopoulo
Proceedings of TALN 2013 (Volume 2: Short Papers)

pdf bib
Converting dependencies for syntactic analysis of French into PASSAGE functional relations (Convertir des analyses syntaxiques en dépendances vers les relations fonctionnelles PASSAGE) [in French]
Patrick Paroubek | Munshi Asadullah | Anne Vilnat
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf bib
Indexation libre et contrôlée d’articles scientifiques. Présentation et résultats du défi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French]
Patrick Paroubek | Pierre Zweigenbaum | Dominic Forest | Cyril Grouin
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

pdf bib
A Rough Set Formalization of Quantitative Evaluation with Ambiguity
Patrick Paroubek | Xavier Tannier
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we present the founding elements of a formal model of the evaluation paradigm in natural language processing. We propose an abstract model of objective quantitative evaluation based on rough sets, as well as the notion of potential performance space for describing the performance variations corresponding to the ambiguity present in hypothesis data produced by a computer program, when comparing it to the reference data created by humans. A formal model of the evaluation paradigm will be useful for comparing evaluations protocols, investigating evaluation constraint relaxation and getting a better understanding of the evaluation paradigm, provided it is general enough to be able to represent any natural language processing task.

2011

pdf bib
Classification en polarité de sentiments avec une représentation textuelle à base de sous-graphes d’arbres de dépendances (Sentiment polarity classification using a textual representation based on subgraphs of dependency trees)
Alexander Pak | Patrick Paroubek
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Les approches classiques à base de n-grammes en analyse supervisée de sentiments ne peuvent pas correctement identifier les expressions complexes de sentiments à cause de la perte d’information induite par l’approche « sac de mots » utilisée pour représenter les textes. Dans notre approche, nous avons recours à des sous-graphes extraits des graphes de dépendances syntaxiques comme traits pour la classification de sentiments. Nous représentons un texte par un vecteur composé de ces sous-graphes syntaxiques et nous employons un classifieurs SVM état-de-l’art pour identifier la polarité d’un texte. Nos évaluations expérimentales sur des critiques de jeux vidéo montrent que notre approche à base de sous-graphes est meilleure que les approches standard à modèles « sac de mots » et n-grammes. Dans cet article nous avons travaillé sur le français, mais notre approche peut facilement être adaptée à d’autres langues.

2010

pdf bib
Construction d’un lexique affectif pour le français à partir de Twitter
Alexander Pak | Patrick Paroubek
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Un lexique affectif est un outil utile pour l’étude des émotions ainsi que pour la fouille d’opinion et l’analyse des sentiments. Un tel lexique contient des listes de mots annotés avec leurs évaluations émotionnelles. Il existe un certain nombre de lexiques affectifs pour la langue anglaise, espagnole, allemande, mais très peu pour le français. Un travail de longue haleine est nécessaire pour construire et enrichir un lexique affectif. Nous proposons d’utiliser Twitter, la plateforme la plus populaire de microblogging de nos jours, pour recueillir un corpus de textes émotionnels en français. En utilisant l’ensemble des données recueillies, nous avons estimé les normes affectives de chaque mot. Nous utilisons les données de la Norme Affective desMots Anglais (ANEW, Affective Norms of EnglishWords) que nous avons traduite en français afin de valider nos résultats. Les valeurs du coefficient tau de Kendall et du coefficient de corrélation de rang de Spearman montrent que nos scores estimés sont en accord avec les scores ANEW.

pdf bib
Twitter Based System: Using Twitter for Disambiguating Sentiment Ambiguous Adjectives
Alexander Pak | Patrick Paroubek
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Le microblogging pour la micro analyse des sentiments et des opinons [Microblogging for Micro Sentiment Analysis and Opinion Mining]
Alexander Pak | Patrick Paroubek
Traitement Automatique des Langues, Volume 51, Numéro 3 : Opinions, sentiments et jugements d’évaluation [Opinions, sentiment and evaluative language]

pdf bib
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Alexander Pak | Patrick Paroubek
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier, that is able to determine positive, negative and neutral sentiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previously proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.

pdf bib
Annotations for Opinion Mining Evaluation in the Industrial Context of the DOXA project
Patrick Paroubek | Alexander Pak | Djamel Mostefa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

After presenting opinion and sentiment analysis state of the art and the DOXA project, we review the few evaluation campaigns that have dealt in the past with opinion mining. Then we present the two level opinion and sentiment model that we will use for evaluation in the DOXA project and the annotation interface we use for hand annotating a reference corpus. We then present the corpus which will be used on DOXA and report on the hand-annotation task on a corpus of comments on video games and the solution adopted to obtain a sufficient level of inter-annotator agreement.

pdf bib
The Second Evaluation Campaign of PASSAGE on Parsing of French
Patrick Paroubek | Olivier Hamon | Eric de La Clergerie | Cyril Grouin | Anne Vilnat
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
PASSAGE Syntactic Representation: a Minimal Common Ground for Evaluation
Anne Vilnat | Patrick Paroubek | Eric Villemonte de la Clergerie | Gil Francopoulo | Marie-Laure Guénot
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The current PASSAGE syntactic representation is the result of 9 years of constant evolution with the aim of providing a common ground for evaluating parsers of French whatever their type and supporting theory. In this paper we present the latest developments concerning the formalism and show first through a review of basic linguistic phenomena that it is a plausible minimal common ground for representing French syntax in the context of generic black box quantitative objective evaluation. For the phenomena reviewed, which include: the notion of syntactic head, apposition, control and coordination, we explain how PASSAGE representation relates to other syntactic representation schemes for French and English, slightly extending the annotation to address English when needed. Second, we describe the XML format chosen for PASSAGE and show that it is compliant with the latest propositions in terms of linguistic annotation standard. We conclude discussing the influence that corpus-based evaluation has on the characteristics of syntactic representation when willing to assess the performance of any kind of parser.

2008

pdf bib
Human Judgement as a Parameter in Evaluation Campaigns
Jean-Baptiste Berthelin | Cyril Grouin | Martine Hurault-Plantet | Patrick Paroubek
Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics

pdf bib
Large Scale Production of Syntactic Annotations to Move Forward
Anne Vilnat | Gil Francopoulo | Olivier Hamon | Sylvain Loiseau | Patrick Paroubek | Eric Villemonte de la Clergerie
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

pdf bib
Annotation and analysis of overlapping speech in political interviews
Martine Adda-Decker | Claude Barras | Gilles Adda | Patrick Paroubek | Philippe Boula de Mareüil | Benoit Habert
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Looking for a better understanding of spontaneous speech-related phenomena and to improve automatic speech recognition (ASR), we present here a study on the relationship between the occurrence of overlapping speech segments and disfluencies (filled pauses, repetitions, revisions) in political interviews. First we present our data, and our overlap annotation scheme. We detail our choice of overlapping tags and our definition of disfluencies; the observed ratios of the different overlapping tags are examined, as well as their correlation with of the speaker role and propose two measures to characterise speakers’ interacting attitude: the attack/resist ratio and the attack density. We then study the relationship between the overlapping speech segments and the disfluencies in our corpus, before concluding on the perspectives that our experiments offer.

pdf bib
EASY, Evaluation of Parsers of French: what are the Results?
Patrick Paroubek | Isabelle Robba | Anne Vilnat | Christelle Ayache
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents EASY, which has been the first campaign evaluating syntactic parsers on all the common syntactic phenomena and a large set of dependency relations. The language analyzed was French. During this campaign, an annotation scheme has been elaborated with the different actors: participants and corpus providers; then a corpus made of several syntactic materials has been built and annotated: it reflects a great variety of linguistic styles (from literature to oral transcriptions, and from newspapers to medical texts). Both corpus and annotation scheme are here briefly presented. Moreover, evaluation measures are explained and detailed results are given. The results of the 15 parsers coming from 12 teams are analyzed. To conclude, a first experiment aiming to combine the outputs of the different systems is shown.

pdf bib
PASSAGE: from French Parser Evaluation to Large Sized Treebank
Éric Villemonte de la Clergerie | Olivier Hamon | Djamel Mostefa | Christelle Ayache | Patrick Paroubek | Anne Vilnat
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present the PASSAGE project which aims at building automatically a French Treebank of large size by combining the output of several parsers, using the EASY annotation scheme. We present also the results of the of the first evaluation campaign of the project and the preliminary results we have obtained with our ROVER procedure for combining parsers automatically.

pdf bib
SEWS : un serveur d’évaluation orienté Web pour la syntaxe [SEWS : an web-based server for evaluating syntactic annotation tools]
Olivier Hamon | Patrick Paroubek | Djamel Mostef
Traitement Automatique des Langues, Volume 49, Numéro 2 : Plate-formes pour le traitement automatique des langues [Platforms for Natural Language Processing]

2007

pdf bib
Les résultats de la campagne EASY d’évaluation des analyseurs syntaxiques du français
Patrick Paroubek | Anne Vilnat | Isabelle Robba | Christelle Ayache
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Dans cet article, nous présentons les résultats de la campagne d’évaluation EASY des analyseurs syntaxiques du français. EASY a été la toute première campagne d’évaluation comparative des analyseurs syntaxiques du français en mode boîte noire utilisant des mesures objectives quantitatives. EASY fait partie du programme TECHNOLANGUE du Ministère délégué à la Recherche et à l’Éducation, avec le soutien du ministère de délégué à l’industrie et du ministère de la culture et de la communication. Nous exposons tout d’abord la position de la campagne par rapport aux autres projets d’évaluation en analyse syntaxique, puis nous présentos son déroulement, et donnons les résultats des 15 analyseurs participants en fonction des différents types de corpus et des différentes annotations (constituants et relations). Nous proposons ensuite un ensemble de leçons à tirer de cette campagne, en particulier à propos du protocole d’évaluation, de la définition de la segmentation en unités linguistiques, du formalisme et des activités d’annotation, des critères de qualité des données, des annotations et des résultats, et finalement de la notion de référence en analyse syntaxique. Nous concluons en présentant comment les résultats d’EASY se prolongent dans le projet PASSAGE (ANR-06-MDCA-013) qui vient de débuter et dont l’objectif est d’étiqueter un grand corpus par plusieurs analyseurs en les combinant selon des paramètres issus de l’évaluation.

pdf bib
Principles of Evaluation in Natural Language Processing
Patrick Paroubek | Stéphane Chaudiron | Lynette Hirschman
Traitement Automatique des Langues, Volume 48, Numéro 1 : Principes de l'évaluation en Traitement Automatique des Langues [Principles of Evaluation in Natural Language Processing]

2006

pdf bib
Data, Annotations and Measures in EASY the Evaluation Campaign for Parsers of French.
Patrick Paroubek | Isabelle Robba | Anne Vilnat | Christelle Ayache
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents the protocol of EASY the evaluation campaign for syntactic parsers of French in the EVALDA project of the TECHNOLANGUE program. We describe the participants, the corpus and its genre partitioning, the annotation scheme, which allows for the annotation of both constituents and relations, the evaluation methodology and, as an illustration, the results obtained by one participant on half of the corpus.

2004

pdf bib
Annoter en constituants pour évaluer des analyseurs syntaxiques
Anne Vilnat | Laura Monceaux | Patrick Paroubek | Isabelle Robba | Véronique Gendner | Gabriel Illouz | Michèle Jardino
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente l’annotation en constituants menée dans le cadre d’un protocole d’évaluation des analyseurs syntaxiques (mis au point dans le pré-projet PEAS, puis dans le projet EASY). Le choix des constituants est décrit en détail et une première évaluation effectuée à partir des résultats de deux analyseurs est donnée.

pdf bib
Apprentissage collectif et lexique
Julien Poudade | Patrick Paroubek
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article présente l’influence de la zone de travail que possède une entité logicielle pour lui permettre de prédire l’état futur de son environnement, sur la constitution d’un lexique partagé par les différents membres d’une population, dans le cadre d’une variante “du jeu de désignation” (naming game).

pdf bib
Automatic Audio and Manual Transcripts Alignment, Time-code Transfer and Selection of Exact Transcripts
C. Barras | G. Adda | M. Adda-Decker | B. Habert | P. Boula de Mareüil | P. Paroubek
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The present study focuses on automatic processing of sibling resources of audio and written documents, such as available in audio archives or for parliament debates: written texts are close but not exact audio transcripts. Such resources deserve attention for several reasons: they represent an interesting testbed for studying differences between written and spoken material and they yield low cost resources for acoustic model training. When automatically transcribing the audio data, regions of agreement between automatic transcripts and written sources allow to transfer time-codes to the written documents: this may be helpful in an audio archive or audio information retrieval environment. Regions of disagreement can be automatically selected for further correction by human transcribers. This study makes use of 10 hours of French radio interview archives with corresponding press-oriented transcripts. The audio corpus has then been transcribed using the LIMSI speech recognizer resulting in automatic transcripts, exhibiting an average word error rate of 12%. 80% of the text corpus (with word chunks of at least five words) can be exactly aligned with the automatic transcripts of the audio data. The residual word error rate on these 80% is less than 1%.

pdf bib
The Ongoing Evaluation Campaign of Syntactic Parsing of French: EASY
Anne Vilnat | Patrick Paroubek | Laura Monceaux | Isabelle Robba | Véronique Gendner | Gabriel Illouz | Michèle Jardino
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper presents EASY (Evaluation of Analyzers of SYntax), an ongoing evaluation campaign of syntactic parsing of French, a subproject of EVALDA in the French TECHNOLANGUE program. After presenting the elaboration of the annotation formalism, we describe the corpus building steps, the annotation tools, the evaluation measures and finally, plans to produce a validated large linguistic resource, syntactically annotated

pdf bib
The French MEDIA/EVALDA Project: the Evaluation of the Understanding Capability of Spoken Language Dialogue Systems
Laurence Devillers | Hélène Maynard | Sophie Rosset | Patrick Paroubek | Kevin McTait | D. Mostefa | Khalid Choukri | Laurent Charnay | Caroline Bousquet | Nadine Vigouroux | Frédéric Béchet | Laurent Romary | Jean-Yves Antoine | J. Villaneau | Myriam Vergnes | J. Goulian
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The aim of the MEDIA project is to design and test a methodology for the evaluat ion of context-dependent and independent spoken dialogue systems. We propose an evaluation paradigm based on the use of test suites from real-world corpora and a common semantic representation and common metrics. This paradigm should allow us to diagnose the context-sensitive understanding capability of dialogue system s. This paradigm will be used within an evaluation campaign involving several si tes all of which will carry out the task of querying information from a database .

2003

pdf bib
PEAS, the first instantiation of a comparative framework for evaluating parsers of French
Véronique Gendner | Gabriel Illouz | Michèle Jardino | Laura Monceaux | Patrick Paroubek | Isabelle Robba | Anne Vilnat
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The PEACE SLDS understanding evaluation paradigm of the French MEDIA campaign
Laurence Devillers | Hélène Maynard | Patrick Paroubek | Sophie Rosset
Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?

2002

pdf bib
A Protocol for Evaluating Analyzers of Syntax (PEAS)
Véronique Gendner | Gabriel Illouz | Michèle Jardino | Laura Monceaux | Patrick Paroubek | Isabelle Robba | Anne Vilnat
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Introduction
Patrick Paroubek
Proceedings of the ACL 2001 Workshop on Evaluation Methodologies for Language and Dialogue Systems

2000

pdf bib
Language Resources as by-Product of Evaluation: The MULTITAG Example
Patrick Paroubek
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1992

pdf bib
XTAG - A Graphical Workbench for Developing Tree-Adjoining Grammars
Patrick Paroubek | Yves Schabes | Aravind K. Joshi
Third Conference on Applied Natural Language Processing

Search
Co-authors