ACCOLÉ : Annotation Collaborative d’erreurs de traduction pour COrpus aLignÉs, multi-cibles, et Annotation d’Expressions Poly-lexicales (ACCOLÉ: A Collaborative Platform of Error Annotation for Aligned)
Emmanuelle Esperança-Rodier | Francis Brunet-Manquat
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 : Démonstrations

Cette démonstration présente les avancées d’ACCOLÉ (Annotation Collaborative d’erreurs de traduction pour COrpus aLignÉs), qui en plus de proposer une gestion simplifiée des corpus et des typologies d’erreurs, l’annotation d’erreurs pour des corpus de traduction bilingues alignés, la collaboration et/ou supervision lors de l’annotation, la recherche de modèle d’erreurs dans les annotations, permet désormais d’annoter les Expressions Polylexicales (EPL) dans des textes monolingues en français, et d’accéder à l’annotation d’erreurs pour des corpus de traduction multicibles. Dans cet article, après un bref rappel des fonctionnalités d’ACCOLÉ, nous explicitons les fonctionnalités de chaque nouveauté.


Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English
Maha Elbayad | Michael Ustaszewski | Emmanuelle Esperança-Rodier | Francis Brunet-Manquat | Jakob Verbeek | Laurent Besacier
Proceedings of the 28th International Conference on Computational Linguistics

We conduct in this work an evaluation study comparing offline and online neural machine translation architectures. Two sequence-to-sequence models: convolutional Pervasive Attention (Elbayad et al. 2018) and attention-based Transformer (Vaswani et al. 2017) are considered. We investigate, for both architectures, the impact of online decoding constraints on the translation quality through a carefully designed human evaluation on English-German and German-English language pairs, the latter being particularly sensitive to latency constraints. The evaluation results allow us to identify the strengths and shortcomings of each model when we shift to the online setup.

Providing Semantic Knowledge to a Set of Pictograms for People with Disabilities: a Set of Links between WordNet and Arasaac: Arasaac-WN
Didier Schwab | Pauline Trial | Céline Vaschalde | Loïc Vial | Emmanuelle Esperanca-Rodier | Benjamin Lecouteux
Proceedings of the 12th Language Resources and Evaluation Conference

This article presents a resource that links WordNet, the widely known lexical and semantic database, and Arasaac, the largest freely available database of pictograms. Pictograms are a tool that is more and more used by people with cognitive or communication disabilities. However, they are mainly used manually via workbooks, whereas caregivers and families would like to use more automated tools (use speech to generate pictograms, for example). In order to make it possible to use pictograms automatically in NLP applications, we propose a database that links them to semantic knowledge. This resource is particularly interesting for the creation of applications that help people with cognitive disabilities, such as text-to-picto, speech-to-picto, picto-to-speech... In this article, we explain the needs for this database and the problems that have been identified. Currently, this resource combines approximately 800 pictograms with their corresponding WordNet synsets and it is accessible both through a digital collection and via an SQL database. Finally, we propose a method with associated tools to make our resource language-independent: this method was applied to create a first text-to-picto prototype for the French language. Our resource is distributed freely under a Creative Commons license at the following URL: https://github.com/getalp/Arasaac-WN.


With or without post-editing processes? Evidence for a gap in machine translation evaluation
Caroline Rossi | Emmanuelle Esperança-Rodier
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production


ACCOLÉ : Annotation Collaborative d’erreurs de traduction pour COrpus aLignÉs (ACCOLÉ: A Collaborative Platform of Error Annotation for Aligned Corpus)
Francis Brunet-Manquat | Emmanuelle Esperança-Rodier
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

La plateforme ACCOLÉ (Annotation Collaborative d’erreurs de traduction pour COrpus aLignÉs) propose une palette de services innovants permettant de répondre aux besoins modernes d’analyse d’erreurs de traduction : gestion simplifiée des corpus et des typologies d’erreurs, annotation d’erreurs efficace, collaboration et/ou supervision lors de l’annotation, recherche de modèle d’erreurs dans les annotations.


Translation quality evaluation of MWE from French into English using an SMT system
Emmanuelle Esperança-Rodier | Johan Didier
Proceedings of Translating and the Computer 38


Collection of a Large Database of French-English SMT Output Corrections
Marion Potet | Emmanuelle Esperança-Rodier | Laurent Besacier | Hervé Blanchon
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Corpus-based approaches to machine translation (MT) rely on the availability of parallel corpora. To produce user-acceptable translation outputs, such systems need high quality data to be efficiency trained, optimized and evaluated. However, building high quality dataset is a relatively expensive task. In this paper, we describe the data collection and analysis of a large database of 10.881 SMT translation output hypotheses manually corrected. These post-editions were collected using Amazon's Mechanical Turk, following some ethical guidelines. A complete analysis of the collected data pointed out a high quality of the corrections with more than 87 % of the collected post-editions that improve hypotheses and more than 94 % of the crowdsourced post-editions which are at least of professional quality. We also post-edited 1,500 gold-standard reference translations (of bilingual parallel corpora generated by professional) and noticed that 72 % of these translations needed to be corrected during post-edition. We computed a proximity measure between the differents kind of translations and pointed out that reference translations are as far from the hypotheses than from the corrected hypotheses (i.e. the post-editions). In light of these last findings, we discuss the adequation of text-based generated reference translations to train setence-to-sentence based SMT systems.


Oracle-based Training for Phrase-based Statistical Machine Translation
Marion Potet | Emmanuelle Esperança-Rodier | Hervé Blanchon | Laurent Besacier
Proceedings of the 15th Annual conference of the European Association for Machine Translation