Ioan Calapodescu


pdf bib
Naver Labs Europe’s Participation in the Robustness, Chat, and Biomedical Tasks at WMT 2020
Alexandre Berard | Ioan Calapodescu | Vassilina Nikoulina | Jerin Philip
Proceedings of the Fifth Conference on Machine Translation

This paper describes Naver Labs Europe’s participation in the Robustness, Chat, and Biomedical Translation tasks at WMT 2020. We propose a bidirectional German-English model that is multi-domain, robust to noise, and which can translate entire documents (or bilingual dialogues) at once. We use the same ensemble of such models as our primary submission to all three tasks and achieve competitive results. We also experiment with language model pre-training techniques and evaluate their impact on robustness to noise and out-of-domain translation. For German, Spanish, Italian, and French to English translation in the Biomedical Task, we also submit our recently released multilingual Covid19NMT model.


pdf bib
Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness
Alexandre Berard | Ioan Calapodescu | Marc Dymetman | Claude Roux | Jean-Luc Meunier | Vassilina Nikoulina
Proceedings of the 3rd Workshop on Neural Generation and Translation

We share a French-English parallel corpus of Foursquare restaurant reviews, and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such user-generated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over existing online systems. Finally, we propose task-specific metrics based on sentiment analysis or translation accuracy of domain-specific polysemous words.

pdf bib
Naver Labs Europe’s Systems for the Document-Level Generation and Translation Task at WNGT 2019
Fahimeh Saleh | Alexandre Berard | Ioan Calapodescu | Laurent Besacier
Proceedings of the 3rd Workshop on Neural Generation and Translation

Recently, neural models led to significant improvements in both machine translation (MT) and natural language generation tasks (NLG). However, generation of long descriptive summaries conditioned on structured data remains an open challenge. Likewise, MT that goes beyond sentence-level context is still an open issue (e.g., document-level MT or MT with metadata). To address these challenges, we propose to leverage data from both tasks and do transfer learning between MT, NLG, and MT with source-side metadata (MT+NLG). First, we train document-based MT systems with large amounts of parallel data. Then, we adapt these models to pure NLG and MT+NLG tasks by fine-tuning with smaller amounts of domain-specific data. This end-to-end NLG approach, without data selection and planning, outperforms the previous state of the art on the Rotowire NLG task. We participated to the “Document Generation and Translation” task at WNGT 2019, and ranked first in all tracks.

pdf bib
Naver Labs Europe’s Systems for the WMT19 Machine Translation Robustness Task
Alexandre Berard | Ioan Calapodescu | Claude Roux
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the systems that we submitted to the WMT19 Machine Translation robustness task. This task aims to improve MT’s robustness to noise found on social media, like informal language, spelling mistakes and other orthographic variations. The organizers provide parallel data extracted from a social media website in two language pairs: French-English and Japanese-English (one for each language direction). The goal is to obtain the best scores on unseen test sets from the same source, according to automatic metrics (BLEU) and human evaluation. We propose one single and one ensemble system for each translation direction. Our ensemble models ranked first in all language pairs, according to BLEU evaluation. We discuss the pre-processing choices that we made, and present our solutions for robustness to noise and domain adaptation.

pdf bib
“Sentiment Aware Map” : exploration cartographique de points d’intérêt via l’analyse de sentiments au niveau des aspects ()
Ioan Calapodescu | Caroline Brun | Vassilina Nikoulina | Salah Aït-Mokhtar
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume IV : Démonstrations


pdf bib
SORT: An Interactive Source-Rewriting Tool for Improved Translation
Shachar Mirkin | Sriram Venkatapathy | Marc Dymetman | Ioan Calapodescu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations