Adrien Lardilleux

2020

The paper describes the submissions of the eTranslation team to the WMT 2020 news translation shared task. Leveraging the experience from the team’s participation last year we developed systems for 5 language pairs with various strategies. Compared to last year, for some language pairs we dedicated a lot more resources to training, and tried to follow standard best practices to build competitive systems which can achieve good results in the rankings. By using deep and complex architectures we sacrificed direct re-usability of our systems in production environments but evaluation showed that this approach could result in better models that significantly outperform baseline architectures. We submitted two systems to the zero shot robustness task. These submissions are described briefly in this paper as well.

2019

pdf bib abs

eTranslation’s Submissions to the WMT 2019 News Translation Task
Csaba Oravecz | Katina Bontcheva | Adrien Lardilleux | László Tihanyi | Andreas Eisele
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the submissions of the eTranslation team to the WMT 2019 news translation shared task. The systems have been developed with the aim of identifying and following rather than establishing best practices, under the constraints imposed by a low resource training and decoding environment normally used for our production systems. Thus most of the findings and results are transferable to systems used in the eTranslation service. Evaluations suggest that this approach is able to produce decent models with good performance and speed without the overhead of using prohibitively deep and complex architectures.

2017

pdf bib abs

CHARCUT: Human-Targeted Character-Based MT Evaluation with Loose Differences
Adrien Lardilleux | Yves Lepage
Proceedings of the 14th International Conference on Spoken Language Translation

We present CHARCUT, a character-based machine translation evaluation metric derived from a human-targeted segment difference visualisation algorithm. It combines an iterative search for longest common substrings between the candidate and the reference translation with a simple length-based threshold, enabling loose differences that limit noisy character matches. Its main advantage is to produce scores that directly reflect human-readable string differences, making it a useful support tool for the manual analysis of MT output and its display to end users. Experiments on WMT16 metrics task data show that it is on par with the best “un-trained” metrics in terms of correlation with human judgement, well above BLEU and TER baselines, on both system and segment tasks.

2012

pdf bib

Alignement sous-phrastique hiérarchique avec Anymalign (Hierarchical Sub-Sentential Alignment with Anymalign) [in French]
Adrien Lardilleux | François Yvon | Yves Lepage
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib

Hierarchical Sub-sentential Alignment with Anymalign
Adrien Lardilleux | François Yvon | Yves Lepage
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

2011

pdf bib

Improving Sampling-based Alignment by Investigating the Distribution of N-grams in Phrase Translation Tables
Juan Luo | Adrien Lardilleux | Yves Lepage
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

pdf bib

pdf bib abs

Généralisation de l’alignement sous-phrastique par échantillonnage (Generalization of sub-sentential alignment by sampling)
Adrien Lardilleux | François Yvon | Yves Lepage
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’alignement sous-phrastique consiste à extraire des traductions d’unités textuelles de grain inférieur à la phrase à partir de textes multilingues parallèles alignés au niveau de la phrase. Un tel alignement est nécessaire, par exemple, pour entraîner des systèmes de traduction statistique. L’approche standard pour réaliser cette tâche implique l’estimation successive de plusieurs modèles probabilistes de complexité croissante et l’utilisation d’heuristiques qui permettent d’aligner des mots isolés, puis, par extension, des groupes de mots. Dans cet article, nous considérons une approche alternative, initialement proposée dans (Lardilleux & Lepage, 2008), qui repose sur un principe beaucoup plus simple, à savoir la comparaison des profils d’occurrences dans des souscorpus obtenus par échantillonnage. Après avoir analysé les forces et faiblesses de cette approche, nous montrons comment améliorer la détection d’unités de traduction longues, et évaluons ces améliorations sur des tâches de traduction automatique.

2010

pdf bib abs

Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs
Adrien Lardilleux | Julien Gosme | Yves Lepage
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present a simple protocol to evaluate word aligners on bilingual lexicon induction tasks from parallel corpora. Rather than resorting to gold standards, it relies on a comparison of the outputs of word aligners against a reference bilingual lexicon. The quality of this reference bilingual lexicon does not need to be particularly high, because evaluation quality is ensured by systematically filtering this reference lexicon with the parallel corpus the word aligners are trained on. We perform a comparison of three freely available word aligners on numerous language pairs from the Bible parallel corpus (Resnik et al., 1999): MGIZA++ (Gao and Vogel, 2008), BerkeleyAligner (Liang et al., 2006), and Anymalign (Lardilleux and Lepage, 2009). We then select the most appropriate one to produce bilingual lexicons for all language pairs of this corpus. These involve Cebuano, Chinese, Danish, English, Finnish, French, Greek, Indonesian, Latin, Spanish, Swedish, and Vietnamese. The 66 resulting lexicons are made freely available.

2009

pdf bib abs

The GREYC translation memory for the IWSLT 2009 evaluation campaign
Yves Lepage | Adrien Lardilleux | Julien Gosme
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This year’s GREYC translation system is an improved translation memory that was designed from scratch to experiment with an approach whose goal is just to improve over the output of a standard translation memory by making heavy use of sub-sentential alignments in a restricted case of translation by analogy. The tracks the system participated in are all BTEC tracks: Arabic to English, Chinese to English, and Turkish to English.

pdf bib

Sampling-based Multilingual Alignment
Adrien Lardilleux | Yves Lepage
Proceedings of the International Conference RANLP-2009

pdf bib abs

anymalign : un outil d’alignement sous-phrastique libre pour les êtres humains
Adrien Lardilleux | Yves Lepage
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Nous présentons anymalign, un aligneur sous-phrastique grand public. Ses résultats ont une qualité qui rivalise avec le meilleur outil du domaine, GIZA++. Il est rapide et simple d’utilisation, et permet de produire dictionnaires et autres tables de traduction en une seule commande. À notre connaissance, c’est le seul outil au monde permettant d’aligner un nombre quelconque de langues simultanément. Il s’agit donc du premier aligneur sousphrastique réellement multilingue.

2008

pdf bib abs

The GREYC machine translation system for the IWSLT 2008 evaluation campaign.
Yves Lepage | Adrien Lardilleux | Julien Gosme | Jean-Luc Manguin
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This year's GREYC machine translation (MT) system presents three major changes relative to the system presented during the previous campaign, while, of course, remaining a pure example-based MT system that exploits proportional analogies. Firstly, the analogy solver has been replaced with a truly non-deterministic one. Secondly, the engine has been re-engineered and a better control has been introduced. Thirdly, the data used for translation were the data provided by the organizers plus alignments obtained using a new alignment method. This year we chose to have the engine run with the word as the processing unit on the contrary to previous years where the processing unit used to be the character. The tracks the system participated in are all classic BTEC tracks (Arabic-English, Chinese-English and Chinese-Spanish) plus the so-called PIVOT task, where the test set had to be translated from Chinese into Spanish by way of English.

pdf bib abs

A truly multilingual, high coverage, accurate, yet simple, subsentential alignment method
Adrien Lardilleux | Yves Lepage
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper describes a new alignment method that extracts high quality multi-word alignments from sentence-aligned multilingual parallel corpora. The method can handle several languages at once. The phrase tables obtained by the method have a comparable accuracy and a higher coverage than those obtained by current methods. They are also obtained much faster.

pdf bib

Multilingual Alignments by Monolingual String Differences
Adrien Lardilleux | Yves Lepage
Coling 2008: Companion volume: Posters

2007

pdf bib abs

The GREYC machine translation system for the IWSLT 2007 evaluation campaign
Yves Lepage | Adrien Lardilleux
Proceedings of the Fourth International Workshop on Spoken Language Translation

The GREYC machine translation (MT) system is a slight evolution of the ALEPH machine translation system that participated in the IWLST 2005 campaign. It is a pure example-based MT system that exploits proportional analogies. The training data used for this campaign were limited on purpose to the sole data provided by the organizers. However, the training data were expanded with the results of sub-sentential alignments. Thesystemparticipatedinthetwoclassicaltasks of translation of manually transcribed texts from Japanese to English and Arabic to English.

Venues