Raphael Rubino

Also published as: Raphaël Rubino


2022

pdf bib
A Multilingual Multiway Evaluation Data Set for Structured Document Translation of Asian Languages
Bianka Buschbeck | Raj Dabre | Miriam Exel | Matthias Huck | Patrick Huy | Raphael Rubino | Hideki Tanaka
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Translation of structured content is an important application of machine translation, but the scarcity of evaluation data sets, especially for Asian languages, limits progress. In this paper we present a novel multilingual multiway evaluation data set for the translation of structured documents of the Asian languages Japanese, Korean and Chinese. We describe the data set, its creation process and important characteristics, followed by establishing and evaluating baselines using the direct translation as well as detag-project approaches. Our data set is well suited for multilingual evaluation, and it contains richer annotation tag sets than existing data sets. Our results show that massively multilingual translation models like M2M-100 and mBART-50 perform surprisingly well despite not being explicitly trained to handle structured content. The data set described in this paper and used in our experiments is released publicly.

2021

pdf bib
NICT Kyoto Submission for the WMT’21 Quality Estimation Task: Multimetric Multilingual Pretraining for Critical Error Detection
Raphael Rubino | Atsushi Fujita | Benjamin Marie
Proceedings of the Sixth Conference on Machine Translation

This paper presents the NICT Kyoto submission for the WMT’21 Quality Estimation (QE) Critical Error Detection shared task (Task 3). Our approach relies mainly on QE model pretraining for which we used 11 language pairs, three sentence-level and three word-level translation quality metrics. Starting from an XLM-R checkpoint, we perform continued training by modifying the learning objective, switching from masked language modeling to QE oriented signals, before finetuning and ensembling the models. Results obtained on the test set in terms of correlation coefficient and F-score show that automatic metrics and synthetic data perform well for pretraining, with our submissions ranked first for two out of four language pairs. A deeper look at the impact of each metric on the downstream task indicates higher performance for token oriented metrics, while an ablation study emphasizes the usefulness of conducting both self-supervised and QE pretraining.

pdf bib
Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers
Benjamin Marie | Atsushi Fujita | Raphael Rubino
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents the first large-scale meta-evaluation of machine translation (MT). We annotated MT evaluations conducted in 769 research papers published from 2010 to 2020. Our study shows that practices for automatic MT evaluation have dramatically changed during the past decade and follow concerning trends. An increasing number of MT evaluations exclusively rely on differences between BLEU scores to draw conclusions, without performing any kind of statistical significance testing nor human evaluation, while at least 108 metrics claiming to be better than BLEU have been proposed. MT evaluations in recent papers tend to copy and compare automatic metric scores from previous work to claim the superiority of a method or an algorithm without confirming neither exactly the same training, validating, and testing data have been used nor the metric scores are comparable. Furthermore, tools for reporting standardized metric scores are still far from being widely adopted by the MT community. After showing how the accumulation of these pitfalls leads to dubious evaluation, we propose a guideline to encourage better automatic MT evaluation along with a simple meta-evaluation scoring method to assess its credibility.

pdf bib
Error Identification for Machine Translation with Metric Embedding and Attention
Raphael Rubino | Atsushi Fujita | Benjamin Marie
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.

2020

pdf bib
Tagged Back-translation Revisited: Why Does It Really Work?
Benjamin Marie | Raphael Rubino | Atsushi Fujita
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we show that neural machine translation (NMT) systems trained on large back-translated data overfit some of the characteristics of machine-translated texts. Such NMT systems better translate human-produced translations, i.e., translationese, but may largely worsen the translation quality of original texts. Our analysis reveals that adding a simple tag to back-translations prevents this quality degradation and improves on average the overall translation quality by helping the NMT system to distinguish back-translated data from original parallel data during training. We also show that, in contrast to high-resource configurations, NMT systems trained in low-resource settings are much less vulnerable to overfit back-translations. We conclude that the back-translations in the training data should always be tagged especially when the origin of the text to be translated is unknown.

pdf bib
Balancing Cost and Benefit with Tied-Multi Transformers
Raj Dabre | Raphael Rubino | Atsushi Fujita
Proceedings of the Fourth Workshop on Neural Generation and Translation

We propose a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In training an encoder-decoder model, typically, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss. Instead, our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers. Such a model subsumes NxM models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers. Given our flexible tied model, we also address to a-priori selection of the number of encoder and decoder layers for faster decoding, and explore recurrent stacking of layers and knowledge distillation for model compression. We present a cost-benefit analysis of applying the proposed approaches for neural machine translation and show that they reduce decoding costs while preserving translation quality.

pdf bib
Intermediate Self-supervised Learning for Machine Translation Quality Estimation
Raphael Rubino | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

Pre-training sentence encoders is effective in many natural language processing tasks including machine translation (MT) quality estimation (QE), due partly to the scarcity of annotated QE data required for supervised learning. In this paper, we investigate the use of an intermediate self-supervised learning task for sentence encoder aiming at improving QE performances at the sentence and word levels. Our approach is motivated by a problem inherent to QE: mistakes in translation caused by wrongly inserted and deleted tokens. We modify the translation language model (TLM) training objective of the cross-lingual language model (XLM) to orientate the pre-trained model towards the target task. The proposed method does not rely on annotated data and is complementary to QE methods involving pre-trained sentence encoders and domain adaptation. Experiments on English-to-German and English-to-Russian translation directions show that intermediate learning improves over domain adaptated models. Additionally, our method reaches results in par with state-of-the-art QE models without requiring the combination of several approaches and outperforms similar methods based on pre-trained sentence encoders.

pdf bib
Combination of Neural Machine Translation Systems at WMT20
Benjamin Marie | Raphael Rubino | Atsushi Fujita
Proceedings of the Fifth Conference on Machine Translation

This paper presents neural machine translation systems and their combination built for the WMT20 English-Polish and Japanese->English translation tasks. We show that using a Transformer Big architecture, additional training data synthesized from monolingual data, and combining many NMT systems through n-best list reranking improve translation quality. However, while we observed such improvements on the validation data, we did not observed similar improvements on the test data. Our analysis reveals that the presence of translationese texts in the validation data led us to take decisions in building NMT systems that were not optimal to obtain the best results on the test data.

pdf bib
NICT Kyoto Submission for the WMT’20 Quality Estimation Task: Intermediate Training for Domain and Task Adaptation
Raphael Rubino
Proceedings of the Fifth Conference on Machine Translation

This paper describes the NICT Kyoto submission for the WMT’20 Quality Estimation (QE) shared task. We participated in Task 2: Word and Sentence-level Post-editing Effort, which involved Wikipedia data and two translation directions, namely English-to-German and English-to-Chinese. Our approach is based on multi-task fine-tuned cross-lingual language models (XLM), initially pre-trained and further domain-adapted through intermediate training using the translation language model (TLM) approach complemented with a novel self-supervised learning task which aim is to model errors inherent to machine translation outputs. Results obtained on both word and sentence-level QE show that the proposed intermediate training method is complementary to language model domain adaptation and outperforms the fine-tuning only approach.

2018

pdf bib
Findings of the WMT 2018 Shared Task on Automatic Post-Editing
Rajen Chatterjee | Matteo Negri | Raphael Rubino | Marco Turchi
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present the results from the fourth round of the WMT shared task on MT Automatic Post-Editing. The task consists in automatically correcting the output of a “black-box” machine translation system by learning from human corrections. Keeping the same general evaluation setting of the three previous rounds, this year we focused on one language pair (English-German) and on domain-specific data (Information Technology), with MT outputs produced by two different paradigms: phrase-based (PBSMT) and neural (NMT). Five teams submitted respectively 11 runs for the PBSMT subtask and 10 runs for the NMT subtask. In the former subtask, characterized by original translations of lower quality, top results achieved impressive improvements, up to -6.24 TER and +9.53 BLEU points over the baseline “do-nothing” system. The NMT subtask proved to be more challenging due to the higher quality of the original translations and the availability of less training data. In this case, top results show smaller improvements up to -0.38 TER and +0.8 BLEU points.

pdf bib
DFKI-MLT System Description for the WMT18 Automatic Post-editing Task
Daria Pylypenko | Raphael Rubino
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper presents the Automatic Post-editing (APE) systems submitted by the DFKI-MLT group to the WMT’18 APE shared task. Three monolingual neural sequence-to-sequence APE systems were trained using target-language data only: one using an attentional recurrent neural network architecture and two using the attention-only (transformer) architecture. The training data was composed of machine translated (MT) output used as source to the APE model aligned with their manually post-edited version or reference translation as target. We made use of the provided training sets only and trained APE models applicable to phrase-based and neural MT outputs. Results show better performances reached by the attention-only model over the recurrent one, significant improvement over the baseline when post-editing phrase-based MT output but degradation when applied to neural MT output.

2017

pdf bib
Findings of the 2017 Conference on Machine Translation (WMT17)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Shujian Huang | Matthias Huck | Philipp Koehn | Qun Liu | Varvara Logacheva | Christof Monz | Matteo Negri | Matt Post | Raphael Rubino | Lucia Specia | Marco Turchi
Proceedings of the Second Conference on Machine Translation

pdf bib
Common Round: Application of Language Technologies to Large-Scale Web Debates
Hans Uszkoreit | Aleksandra Gabryszak | Leonhard Hennig | Jörg Steffen | Renlong Ai | Stephan Busemann | Jon Dehdari | Josef van Genabith | Georg Heigold | Nils Rethmeier | Raphael Rubino | Sven Schmeier | Philippe Thomas | He Wang | Feiyu Xu
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

pdf bib
Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification
Wei Shi | Frances Yung | Raphael Rubino | Vera Demberg
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as explicitation). We automatically back-translate it into an English connective and use it to infer a label with high confidence. We show that a training set several times larger than the original training set can be generated this way. With the extra labeled instances, we show that even a simple bidirectional Long Short-Term Memory Network can outperform the current state-of-the-art.

2016

pdf bib
Findings of the 2016 Conference on Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Martin Popel | Matt Post | Raphael Rubino | Carolina Scarton | Lucia Specia | Marco Turchi | Karin Verspoor | Marcos Zampieri
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Re-assessing the Impact of SMT Techniques with Human Evaluation: a Case Study on English—Croatian
Antonio Toral | Raphael Rubino | Gema Ramírez-Sánchez
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib
Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification
Raphael Rubino | Ekaterina Lapshinova-Koltunski | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Modeling Diachronic Change in Scientific Writing with Information Density
Raphael Rubino | Stefania Degaetano-Ortlieb | Elke Teich | Josef van Genabith
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.

2015

pdf bib
Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling
Raphael Rubino | Tommi Pirinen | Miquel Esplà-Gomis | Nikola Ljubešić | Sergio Ortiz-Rojas | Vassilis Papavassiliou | Prokopis Prokopidis | Antonio Toral
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A. Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel L. Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain
Antonio Toral | Raphael Rubino | Miquel Esplà-Gomis | Tommi Pirinen | Andy Way | Gema Ramírez-Sánchez
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf bib
Quality Estimation for Synthetic Parallel Data Generation
Raphael Rubino | Antonio Toral | Nikola Ljubešić | Gema Ramírez-Sánchez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a novel approach for parallel data generation using machine translation and quality estimation. Our study focuses on pivot-based machine translation from English to Croatian through Slovene. We generate an English―Croatian version of the Europarl parallel corpus based on the English―Slovene Europarl corpus and the Apertium rule-based translation system for Slovene―Croatian. These experiments are to be considered as a first step towards the generation of reliable synthetic parallel data for under-resourced languages. We first collect small amounts of aligned parallel data for the Slovene―Croatian language pair in order to build a quality estimation system for sentence-level Translation Edit Rate (TER) estimation. We then infer TER scores on automatically translated Slovene to Croatian sentences and use the best translations to build an English―Croatian statistical MT system. We show significant improvement in terms of automatic metrics obtained on two test sets using our approach compared to a random selection of synthetic parallel data.

pdf bib
Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules
Raphael Rubino | Antonio Toral | Victor M. Sánchez-Cartagena | Jorge Ferrández-Tordera | Sergio Ortiz-Rojas | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Andy Way
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Quality Estimation of English-French Machine Translation: A Detailed Study of the Role of Syntax
Rasoul Kaljahi | Jennifer Foster | Johann Roturier | Raphael Rubino
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Quality Estimation-guided Data Selection for Domain Adaptation of SMT
Pratyush Banerjee | Raphael Rubino | Johann Roturier | Josef van Genabith
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Key Problems in Conversion from Simplified to Traditional Chinese Characters Topic Models for Translation Quality Estimation for Gisting Purposes
Raphael Rubino | Jose Guilherme Camargo de Souza | Jennifer Foster | Lucia Specia
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
Topic Models for Translation Quality Estimation for Gisting Purposes
Raphael Rubino | Jose Guilherme Camargo de Souza | Jennifer Foster | Lucia Specia
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
Parser Accuracy in Quality Estimation of Machine Translation: A Tree Kernel Approach
Rasoul Samad Zadeh Kaljahi | Jennifer Foster | Raphael Rubino | Johann Roturier | Fred Hollowood
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Estimating the Quality of Translated User-Generated Content
Raphael Rubino | Jennifer Foster | Rasoul Samad Zadeh Kaljahi | Johann Roturier | Fred Hollowood
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
The CNGL-DCU-Prompsit Translation Systems for WMT13
Raphael Rubino | Antonio Toral | Santiago Cortés Vaíllo | Jun Xie | Xiaofeng Wu | Stephen Doherty | Qun Liu
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
DCU-Symantec at the WMT 2013 Quality Estimation Shared Task
Raphael Rubino | Joachim Wagner | Jennifer Foster | Johann Roturier | Rasoul Samad Zadeh Kaljahi | Fred Hollowood
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
An Approach Using Style Classification Features for Quality Estimation
Erwan Moreau | Raphael Rubino
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
An Evaluation of Statistical Post-Editing Systems Applied to RBMT and SMT Systems
Hanna Béchara | Raphaël Rubino | Yifan He | Yanjun Ma | Josef van Genabith
Proceedings of COLING 2012

pdf bib
A Detailed Analysis of Phrase-based and Syntax-based MT: The Search for Systematic Differences
Rasoul Samad Zadeh Kaljahi | Raphael Rubino | Johann Roturier | Jennifer Foster
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging.

pdf bib
DCU-Symantec Submission for the WMT 2012 Quality Estimation Task
Raphael Rubino | Jennifer Foster | Joachim Wagner | Johann Roturier | Rasul Samad Zadeh Kaljahi | Fred Hollowood
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Sentence-Level Quality Estimation for MT System Combination
Tsuyoshi Okita | Raphaël Rubino | Josef van Genabith
Proceedings of the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT

pdf bib
Statistical Post-Editing of Machine Translation for Domain Adaptation
Raphaël Rubino | Stéphane Huet | Fabrice Lefèvre | Georges Linarès
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib
Post-édition statistique pour l’adaptation aux domaines de spécialité en traduction automatique (Statistical Post-Editing of Machine Translation for Domain Adaptation) [in French]
Raphaël Rubino | Stéphane Huet | Fabrice Lefèvre | Georges Linarès
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

2011

pdf bib
The LIGA (LIG/LIA) Machine Translation System for WMT 2011
Marion Potet | Raphaël Rubino | Benjamin Lecouteux | Stéphane Huet | Laurent Besacier | Hervé Blanchon | Fabrice Lefèvre
Proceedings of the Sixth Workshop on Statistical Machine Translation

2009

pdf bib
Exploring Context Variation and Lexicon Coverage in Projection-based Approach for Term Translation
Raphaël Rubino
Proceedings of the Student Research Workshop

Search
Co-authors