Claude Roux


2023

pdf bib
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh Dhole | Varun Gangal | Sebastian Gehrmann | Aadesh Gupta | Zhenhao Li | Saad Mahamood | Abinaya Mahadiran | Simon Mille | Ashish Shrivastava | Samson Tan | Tongshang Wu | Jascha Sohl-Dickstein | Jinho Choi | Eduard Hovy | Ondřej Dušek | Sebastian Ruder | Sajant Anand | Nagender Aneja | Rabin Banjade | Lisa Barthe | Hanna Behnke | Ian Berlot-Attwell | Connor Boyle | Caroline Brun | Marco Antonio Sobrevilla Cabezudo | Samuel Cahyawijaya | Emile Chapuis | Wanxiang Che | Mukund Choudhary | Christian Clauss | Pierre Colombo | Filip Cornell | Gautier Dagan | Mayukh Das | Tanay Dixit | Thomas Dopierre | Paul-Alexis Dray | Suchitra Dubey | Tatiana Ekeinhor | Marco Di Giovanni | Tanya Goyal | Rishabh Gupta | Louanes Hamla | Sang Han | Fabrice Harel-Canada | Antoine Honoré | Ishan Jindal | Przemysław Joniak | Denis Kleyko | Venelin Kovatchev | Kalpesh Krishna | Ashutosh Kumar | Stefan Langer | Seungjae Ryan Lee | Corey James Levinson | Hualou Liang | Kaizhao Liang | Zhexiong Liu | Andrey Lukyanenko | Vukosi Marivate | Gerard de Melo | Simon Meoni | Maxine Meyer | Afnan Mir | Nafise Sadat Moosavi | Niklas Meunnighoff | Timothy Sum Hon Mun | Kenton Murray | Marcin Namysl | Maria Obedkova | Priti Oli | Nivranshu Pasricha | Jan Pfister | Richard Plant | Vinay Prabhu | Vasile Pais | Libo Qin | Shahab Raji | Pawan Kumar Rajpoot | Vikas Raunak | Roy Rinberg | Nicholas Roberts | Juan Diego Rodriguez | Claude Roux | Vasconcellos Samus | Ananya Sai | Robin Schmidt | Thomas Scialom | Tshephisho Sefara | Saqib Shamsi | Xudong Shen | Yiwen Shi | Haoyue Shi | Anna Shvets | Nick Siegel | Damien Sileo | Jamie Simon | Chandan Singh | Roman Sitelew | Priyank Soni | Taylor Sorensen | William Soto | Aman Srivastava | Aditya Srivatsa | Tony Sun | Mukund Varma | A Tabassum | Fiona Tan | Ryan Teehan | Mo Tiwari | Marie Tolkiehn | Athena Wang | Zijian Wang | Zijie Wang | Gloria Wang | Fuxuan Wei | Bryan Wilie | Genta Indra Winata | Xinyu Wu | Witold Wydmanski | Tianbao Xie | Usama Yaseen | Michael Yee | Jing Zhang | Yue Zhang
Northern European Journal of Language Technology, Volume 9

Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training data for natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based natural language (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental human mistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguous to humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popular language models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases. The infrastructure, datacards, and robustness evaluation results are publicly available on GitHub for the benefit of researchers working on paraphrase generation, robustness analysis, and low-resource NLP.

2019

pdf bib
Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness
Alexandre Berard | Ioan Calapodescu | Marc Dymetman | Claude Roux | Jean-Luc Meunier | Vassilina Nikoulina
Proceedings of the 3rd Workshop on Neural Generation and Translation

We share a French-English parallel corpus of Foursquare restaurant reviews, and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such user-generated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over existing online systems. Finally, we propose task-specific metrics based on sentiment analysis or translation accuracy of domain-specific polysemous words.

pdf bib
Naver Labs Europe’s Systems for the WMT19 Machine Translation Robustness Task
Alexandre Berard | Ioan Calapodescu | Claude Roux
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the systems that we submitted to the WMT19 Machine Translation robustness task. This task aims to improve MT’s robustness to noise found on social media, like informal language, spelling mistakes and other orthographic variations. The organizers provide parallel data extracted from a social media website in two language pairs: French-English and Japanese-English (one for each language direction). The goal is to obtain the best scores on unseen test sets from the same source, according to automatic metrics (BLEU) and human evaluation. We propose one single and one ensemble system for each translation direction. Our ensemble models ranked first in all language pairs, according to BLEU evaluation. We discuss the pre-processing choices that we made, and present our solutions for robustness to noise and domain adaptation.

2016

pdf bib
XRCE at SemEval-2016 Task 5: Feedbacked Ensemble Modeling on Syntactico-Semantic Knowledge for Aspect Based Sentiment Analysis
Caroline Brun | Julien Perez | Claude Roux
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Un système hybride pour l’analyse de sentiments associés aux aspects
Caroline Brun | Diana Nicoleta Popa | Claude Roux
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente en détails notre participation à la tâche 4 de SemEval2014 (Analyse de Sentiments associés aux Aspects). Nous présentons la tâche et décrivons précisément notre système qui consiste en une combinaison de composants linguistiques et de modules de classification. Nous exposons ensuite les résultats de son évaluation, ainsi que les résultats des meilleurs systèmes. Nous concluons par la présentation de quelques nouvelles expériences réalisées en vue de l’amélioration de ce système.

2014

pdf bib
Investigating the Image of Entities in Social Media: Dataset Design and First Results
Julien Velcin | Young-Min Kim | Caroline Brun | Jean-Yves Dormagen | Eric SanJuan | Leila Khouas | Anne Peradotto | Stephane Bonnevay | Claude Roux | Julien Boyadjian | Alejandro Molina | Marie Neihouser
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The objective of this paper is to describe the design of a dataset that deals with the image (i.e., representation, web reputation) of various entities populating the Internet: politicians, celebrities, companies, brands etc. Our main contribution is to build and provide an original annotated French dataset. This dataset consists of 11527 manually annotated tweets expressing the opinion on specific facets (e.g., ethic, communication, economic project) describing two French policitians over time. We believe that other researchers might benefit from this experience, since designing and implementing such a dataset has proven quite an interesting challenge. This design comprises different processes such as data selection, formal definition and instantiation of an image. We have set up a full open-source annotation platform. In addition to the dataset design, we present the first results that we obtained by applying clustering methods to the annotated dataset in order to extract the entity images.

pdf bib
XRCE: Hybrid Classification for Aspect-based Sentiment Analysis
Caroline Brun | Diana Nicoleta Popa | Claude Roux
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Part of Speech Tagging for French Social Media Data
Farhad Nooralahzadeh | Caroline Brun | Claude Roux
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Decomposing Hashtags to Improve Tweet Polarity Classification (Décomposition des « hash tags » pour l’amélioration de la classification en polarité des « tweets ») [in French]
Caroline Brun | Claude Roux
Proceedings of TALN 2014 (Volume 2: Short Papers)

2006

pdf bib
Coupling a Linguistic Formalism and a Script Language
Claude Roux
Proceedings of the Third Workshop on Constraints and Language Processing

2004

pdf bib
Annoter les documents XML avec un outil d’analyse syntaxique
Claude Roux
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente l’intégration au sein d’un analyseur syntaxique (Xerox Incremental Parser) de règles spécifiques qui permettent de lier l’analyse grammaticale à la sémantique des balises XML spécifiques à un document donné. Ces règles sont basées sur la norme XPath qui offre une très grande finesse de description et permet de guider très précisément l’application de l’analyseur sur une famille de documents partageant une même DTD. Le résultat est alors être intégré directement comme annotation dans le document traité.

pdf bib
Towards an International Standard on Feature Structure Representation
Kiyong Lee | Lou Burnard | Laurent Romary | Eric de la Clergerie | Thierry Declerck | Syd Bauman | Harry Bunt | Lionel Clément | Tomaž Erjavec | Azim Roussanaly | Claude Roux
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Entre syntaxe et sémantique : Normalisation de la sortie de l’analyse syntaxique en vue de l’amélioration de l’extraction d’information à partir de textes
Caroline Hagège | Claude Roux
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente la normalisation de la sortie d’un analyseur robuste de l’anglais. Nous montrons quels sont les enrichissements que nous avons effectués afin de pouvoir obtenir à la sortie de notre analyseur des relations syntaxiques plus générales que celles que nous offrent habituellement les analyseurs robustes existants. Pour cela nous utilisons non seulement des propriétés syntaxiques, mais nous faisons appel aussi à de l’information de morphologie dérivationnelle. Cette tâche de normalisation est menée à bien grâce à notre analyseur XIP qui intègre tous les traitements allant du texte brut tout venant au texte normalisé. Nous pensons que cette normalisation nous permettra de mener avec plus de succès des tâches d’extraction d’information ou de détection de similarité entre documents.

2002

pdf bib
A Robust and Flexible Platform for Dependency Extraction
Caroline Hagège | Claude Roux
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
A Multi-Input Dependency Parser
Salah Aït-Mokhtar | Jean-Pierre Chanod | Claude Roux
Proceedings of the Seventh International Workshop on Parsing Technologies

2000

pdf bib
A Step toward Semantic Indexing of an Encyclopedic Corpus
Philippe Alcouffe | Nicolas Gacon | Claude Roux | Frédérique Segond
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Search
Co-authors