Raphael Rubino

Also published as: Raphaël Rubino

2025

pdf bib

pdf bib abs

Factors Affecting Translation Quality in In-context Learning for Multilingual Medical Domain
Jonathan Mutal | Raphael Rubino | Pierrette Bouillon
Proceedings of the Tenth Conference on Machine Translation

Multilingual machine translation in the medical domain presents critical challenges due to limited parallel data, domain-specific terminology, and the high stakes associated with translation accuracy. In this paper, we explore the potential of in-context learning (ICL) with general-purpose large language models (LLMs) as an alternative to fine-tuning. Focusing on the medical domain and low-resource languages, we evaluate an instruction-tuned LLM on a translation task across 16 languages. We address four research questions centered on prompt design, examining the impact of the number of examples, the domain and register of examples, and the example selection strategy. Our results show that prompting with one to three examples from the same register and domain as the test input leads to the largest improvements in translation quality, as measured by automatic metrics, while translation quality gains plateau with an increased number of examples. Furthermore, we find that example selection methods - lexical and embedding based - do not yield significant benefits over random selection if the register of selected examples does not match that of the test input.

pdf bib abs

Leveraging Large Language Models for Joint Linguistic and Technical Accessibility Improvement: A Case Study on University Webpages
Pierrette Bouillon | Johanna Gerlach | Raphael Rubino
Proceedings of the 1st Workshop on Artificial Intelligence and Easy and Plain Language in Institutional Contexts (AI & EL/PL)

The aim of the study presented in this paper is to investigate whether Large Language Models can be leveraged to translate French content from existing websites into their B1-level simplified versions and to integrate them into an accessible HTML structure. We design a CMS agnostic approach to webpage accessibility improvement based on prompt engineering and apply it to Geneva University webpages. We conduct several automatic and manual evaluations to measure the accessibility improvement reached by several LLMs with various prompts in a zero-shot setting. Results show that LLMs are not all suitable for the task, while a large disparity is observed among results reached by different prompts. Manual evaluation carried out by a dyslexic crowd shows that some LLMs could produce more accessible websites and improve access to information.

2024

pdf bib abs

Automatic Normalisation of Middle French and Its Impact on Productivity
Raphael Rubino | Sandra Coram-Mekkey | Johanna Gerlach | Jonathan Mutal | Pierrette Bouillon
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

This paper presents a study on automatic normalisation of 16th century documents written in Middle French. These documents present a large variety of wordforms which require spelling normalisation to facilitate downstream linguistic and historical studies. We frame the normalisation process as a machine translation task starting with a strong baseline leveraging a pre-trained encoder–decoder model. We propose to improve this baseline by combining synthetic data generation methods and producing artificial training data, thus tackling the lack of parallel corpora relevant to our task. The evaluation of our approach is twofold, in addition to automatic metrics relying on gold references, we evaluate our models through post-editing of their outputs. This evaluation method directly measures the productivity gain brought by our models to experts conducting the normalisation task manually. Results show a 20+ token per minute increase in productivity when using automatic normalisation compared to normalising text from scratch. The manually post-edited dataset resulting from our study is the first parallel corpus of normalised 16th century Middle French to be publicly released, along with the synthetic data and the automatic normalisation models used and trained in the presented work.

pdf bib abs

The RCnum project is funded by the Swiss National Science Foundation and aims at producing a multilingual and semantically rich online edition of the Registers of Geneva Council from 1545 to 1550. Combining multilingual NLP, history and paleography, this collaborative project will clear hurdles inherent to texts manually written in 16th century Middle French while allowing for easy access and interactive consultation of these archives.

pdf bib abs

Normalizing without Modernizing: Keeping Historical Wordforms of Middle French while Reducing Spelling Variants
Raphael Rubino | Johanna Gerlach | Jonathan Mutal | Pierrette Bouillon
Findings of the Association for Computational Linguistics: NAACL 2024

Conservation of historical documents benefits from computational methods by alleviating the manual labor related to digitization and modernization of textual content. Languages usually evolve over time and keeping historical wordforms is crucial for diachronic studies and digital humanities. However, spelling conventions did not necessarily exist when texts were originally written and orthographic variations are commonly observed depending on scribes and time periods. In this study, we propose to automatically normalize orthographic wordforms found in historical archives written in Middle French during the 16th century without fully modernizing textual content. We leverage pre-trained models in a low resource setting based on a manually curated parallel corpus and produce additional resources with artificial data generation approaches. Results show that causal language models and knowledge distillation improve over a strong baseline, thus validating the proposed methods.

pdf bib abs

Improving Sign Language Production in the Healthcare Domain Using UMLS and Multi-task Learning
Jonathan Mutal | Raphael Rubino | Pierrette Bouillon | Bastien David | Johanna Gerlach | Irene Strasly
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

This paper presents a study on Swiss-French sign language production in the medical domain. In emergency care settings, a lack of clear communication can interfere with accurate delivery of health related services. For patients communicating with sign language, equal access to healthcare remains an issue. While previous work has explored producing sign language gloss from a source text, we propose to extend this approach to produce a multichannel sign language output given a written French input. Furthermore, we extend our approach with a multi-task framework allowing us to include the Unified Medical Language System (UMLS) in our model. Results show that the introduction of UMLS in the training data improves model accuracy by 13.64 points.

2022

pdf bib abs

Translation of structured content is an important application of machine translation, but the scarcity of evaluation data sets, especially for Asian languages, limits progress. In this paper we present a novel multilingual multiway evaluation data set for the translation of structured documents of the Asian languages Japanese, Korean and Chinese. We describe the data set, its creation process and important characteristics, followed by establishing and evaluating baselines using the direct translation as well as detag-project approaches. Our data set is well suited for multilingual evaluation, and it contains richer annotation tag sets than existing data sets. Our results show that massively multilingual translation models like M2M-100 and mBART-50 perform surprisingly well despite not being explicitly trained to handle structured content. The data set described in this paper and used in our experiments is released publicly.

2021

pdf bib abs

NICT Kyoto Submission for the WMT’21 Quality Estimation Task: Multimetric Multilingual Pretraining for Critical Error Detection
Raphael Rubino | Atsushi Fujita | Benjamin Marie
Proceedings of the Sixth Conference on Machine Translation

This paper presents the NICT Kyoto submission for the WMT’21 Quality Estimation (QE) Critical Error Detection shared task (Task 3). Our approach relies mainly on QE model pretraining for which we used 11 language pairs, three sentence-level and three word-level translation quality metrics. Starting from an XLM-R checkpoint, we perform continued training by modifying the learning objective, switching from masked language modeling to QE oriented signals, before finetuning and ensembling the models. Results obtained on the test set in terms of correlation coefficient and F-score show that automatic metrics and synthetic data perform well for pretraining, with our submissions ranked first for two out of four language pairs. A deeper look at the impact of each metric on the downstream task indicates higher performance for token oriented metrics, while an ablation study emphasizes the usefulness of conducting both self-supervised and QE pretraining.

pdf bib abs

Error Identification for Machine Translation with Metric Embedding and Attention
Raphael Rubino | Atsushi Fujita | Benjamin Marie
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.

pdf bib abs

Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers
Benjamin Marie | Atsushi Fujita | Raphael Rubino
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper presents the first large-scale meta-evaluation of machine translation (MT). We annotated MT evaluations conducted in 769 research papers published from 2010 to 2020. Our study shows that practices for automatic MT evaluation have dramatically changed during the past decade and follow concerning trends. An increasing number of MT evaluations exclusively rely on differences between BLEU scores to draw conclusions, without performing any kind of statistical significance testing nor human evaluation, while at least 108 metrics claiming to be better than BLEU have been proposed. MT evaluations in recent papers tend to copy and compare automatic metric scores from previous work to claim the superiority of a method or an algorithm without confirming neither exactly the same training, validating, and testing data have been used nor the metric scores are comparable. Furthermore, tools for reporting standardized metric scores are still far from being widely adopted by the MT community. After showing how the accumulation of these pitfalls leads to dubious evaluation, we propose a guideline to encourage better automatic MT evaluation along with a simple meta-evaluation scoring method to assess its credibility.

2020

pdf bib abs

Intermediate Self-supervised Learning for Machine Translation Quality Estimation
Raphael Rubino | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

Pre-training sentence encoders is effective in many natural language processing tasks including machine translation (MT) quality estimation (QE), due partly to the scarcity of annotated QE data required for supervised learning. In this paper, we investigate the use of an intermediate self-supervised learning task for sentence encoder aiming at improving QE performances at the sentence and word levels. Our approach is motivated by a problem inherent to QE: mistakes in translation caused by wrongly inserted and deleted tokens. We modify the translation language model (TLM) training objective of the cross-lingual language model (XLM) to orientate the pre-trained model towards the target task. The proposed method does not rely on annotated data and is complementary to QE methods involving pre-trained sentence encoders and domain adaptation. Experiments on English-to-German and English-to-Russian translation directions show that intermediate learning improves over domain adaptated models. Additionally, our method reaches results in par with state-of-the-art QE models without requiring the combination of several approaches and outperforms similar methods based on pre-trained sentence encoders.

pdf bib abs

Balancing Cost and Benefit with Tied-Multi Transformers
Raj Dabre | Raphael Rubino | Atsushi Fujita
Proceedings of the Fourth Workshop on Neural Generation and Translation

We propose a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In training an encoder-decoder model, typically, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss. Instead, our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers. Such a model subsumes NxM models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers. Given our flexible tied model, we also address to a-priori selection of the number of encoder and decoder layers for faster decoding, and explore recurrent stacking of layers and knowledge distillation for model compression. We present a cost-benefit analysis of applying the proposed approaches for neural machine translation and show that they reduce decoding costs while preserving translation quality.

pdf bib abs

NICT Kyoto Submission for the WMT’20 Quality Estimation Task: Intermediate Training for Domain and Task Adaptation
Raphael Rubino
Proceedings of the Fifth Conference on Machine Translation

This paper describes the NICT Kyoto submission for the WMT’20 Quality Estimation (QE) shared task. We participated in Task 2: Word and Sentence-level Post-editing Effort, which involved Wikipedia data and two translation directions, namely English-to-German and English-to-Chinese. Our approach is based on multi-task fine-tuned cross-lingual language models (XLM), initially pre-trained and further domain-adapted through intermediate training using the translation language model (TLM) approach complemented with a novel self-supervised learning task which aim is to model errors inherent to machine translation outputs. Results obtained on both word and sentence-level QE show that the proposed intermediate training method is complementary to language model domain adaptation and outperforms the fine-tuning only approach.

pdf bib abs

Tagged Back-translation Revisited: Why Does It Really Work?
Benjamin Marie | Raphael Rubino | Atsushi Fujita
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we show that neural machine translation (NMT) systems trained on large back-translated data overfit some of the characteristics of machine-translated texts. Such NMT systems better translate human-produced translations, i.e., translationese, but may largely worsen the translation quality of original texts. Our analysis reveals that adding a simple tag to back-translations prevents this quality degradation and improves on average the overall translation quality by helping the NMT system to distinguish back-translated data from original parallel data during training. We also show that, in contrast to high-resource configurations, NMT systems trained in low-resource settings are much less vulnerable to overfit back-translations. We conclude that the back-translations in the training data should always be tagged especially when the origin of the text to be translated is unknown.

pdf bib abs

Combination of Neural Machine Translation Systems at WMT20
Benjamin Marie | Raphael Rubino | Atsushi Fujita
Proceedings of the Fifth Conference on Machine Translation

This paper presents neural machine translation systems and their combination built for the WMT20 English-Polish and Japanese->English translation tasks. We show that using a Transformer Big architecture, additional training data synthesized from monolingual data, and combining many NMT systems through n-best list reranking improve translation quality. However, while we observed such improvements on the validation data, we did not observed similar improvements on the test data. Our analysis reveals that the presence of translationese texts in the validation data led us to take decisions in building NMT systems that were not optimal to obtain the best results on the test data.

2018

pdf bib abs

DFKI-MLT System Description for the WMT18 Automatic Post-editing Task
Daria Pylypenko | Raphael Rubino
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper presents the Automatic Post-editing (APE) systems submitted by the DFKI-MLT group to the WMT’18 APE shared task. Three monolingual neural sequence-to-sequence APE systems were trained using target-language data only: one using an attentional recurrent neural network architecture and two using the attention-only (transformer) architecture. The training data was composed of machine translated (MT) output used as source to the APE model aligned with their manually post-edited version or reference translation as target. We made use of the provided training sets only and trained APE models applicable to phrase-based and neural MT outputs. Results show better performances reached by the attention-only model over the recurrent one, significant improvement over the baseline when post-editing phrase-based MT output but degradation when applied to neural MT output.

pdf bib abs

Findings of the WMT 2018 Shared Task on Automatic Post-Editing
Rajen Chatterjee | Matteo Negri | Raphael Rubino | Marco Turchi
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present the results from the fourth round of the WMT shared task on MT Automatic Post-Editing. The task consists in automatically correcting the output of a “black-box” machine translation system by learning from human corrections. Keeping the same general evaluation setting of the three previous rounds, this year we focused on one language pair (English-German) and on domain-specific data (Information Technology), with MT outputs produced by two different paradigms: phrase-based (PBSMT) and neural (NMT). Five teams submitted respectively 11 runs for the PBSMT subtask and 10 runs for the NMT subtask. In the former subtask, characterized by original translations of lower quality, top results achieved impressive improvements, up to -6.24 TER and +9.53 BLEU points over the baseline “do-nothing” system. The NMT subtask proved to be more challenging due to the higher quality of the original translations and the availability of less training data. In this case, top results show smaller improvements up to -0.38 TER and +0.8 BLEU points.

2017

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

pdf bib abs

Using Explicit Discourse Connectives in Translation for Implicit Discourse Relation Classification
Wei Shi | Frances Yung | Raphael Rubino | Vera Demberg
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Implicit discourse relation recognition is an extremely challenging task due to the lack of indicative connectives. Various neural network architectures have been proposed for this task recently, but most of them suffer from the shortage of labeled data. In this paper, we address this problem by procuring additional training data from parallel corpora: When humans translate a text, they sometimes add connectives (a process known as explicitation). We automatically back-translate it into an English connective and use it to infer a label with high confidence. We show that a training set several times larger than the original training set can be generated this way. With the extra labeled instances, we show that even a simple bidirectional Long Short-Term Memory Network can outperform the current state-of-the-art.

2016

pdf bib

Re-assessing the Impact of SMT Techniques with Human Evaluation: a Case Study on English—Croatian
Antonio Toral | Raphael Rubino | Gema Ramírez-Sánchez
Proceedings of the 19th Annual Conference of the European Association for Machine Translation

pdf bib

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification
Raphael Rubino | Ekaterina Lapshinova-Koltunski | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib abs

Modeling Diachronic Change in Scientific Writing with Information Density
Raphael Rubino | Stefania Degaetano-Ortlieb | Elke Teich | Josef van Genabith
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Previous linguistic research on scientific writing has shown that language use in the scientific domain varies considerably in register and style over time. In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax. Our approach is based on distinguishing between sentences from 19th and 20th century scientific abstracts using supervised classification models. To the best of our knowledge, the introduction of information theoretic features to this task is novel. We show that these features outperform more traditional features, such as token or character n-grams, while leading to more compact models. We present a detailed analysis of feature informativeness in order to gain a better understanding of diachronic change on different linguistic levels.

2015

pdf bib

pdf bib

pdf bib

2014

pdf bib

pdf bib

Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain
Antonio Toral | Raphael Rubino | Miquel Esplà-Gomis | Tommi Pirinen | Andy Way | Gema Ramírez-Sánchez
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

pdf bib

Quality Estimation of English-French Machine Translation: A Detailed Study of the Role of Syntax
Rasoul Kaljahi | Jennifer Foster | Johann Roturier | Raphael Rubino
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib abs

Quality Estimation for Synthetic Parallel Data Generation
Raphael Rubino | Antonio Toral | Nikola Ljubešić | Gema Ramírez-Sánchez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a novel approach for parallel data generation using machine translation and quality estimation. Our study focuses on pivot-based machine translation from English to Croatian through Slovene. We generate an English―Croatian version of the Europarl parallel corpus based on the English―Slovene Europarl corpus and the Apertium rule-based translation system for Slovene―Croatian. These experiments are to be considered as a first step towards the generation of reliable synthetic parallel data for under-resourced languages. We first collect small amounts of aligned parallel data for the Slovene―Croatian language pair in order to build a quality estimation system for sentence-level Translation Edit Rate (TER) estimation. We then infer TER scores on automatically translated Slovene to Croatian sentences and use the best translations to build an English―Croatian statistical MT system. We show significant improvement in terms of automatic metrics obtained on two test sets using our approach compared to a random selection of synthetic parallel data.

2013

pdf bib

pdf bib

Quality Estimation-guided Data Selection for Domain Adaptation of SMT
Pratyush Banerjee | Raphael Rubino | Johann Roturier | Josef van Genabith
Proceedings of Machine Translation Summit XIV: Papers

pdf bib

Topic Models for Translation Quality Estimation for Gisting Purposes
Raphael Rubino | Jose Guilherme Camargo de Souza | Jennifer Foster | Lucia Specia
Proceedings of Machine Translation Summit XIV: Posters

pdf bib

Estimating the Quality of Translated User-Generated Content
Raphael Rubino | Jennifer Foster | Rasoul Samad Zadeh Kaljahi | Johann Roturier | Fred Hollowood
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib

Key Problems in Conversion from Simplified to Traditional Chinese Characters Topic Models for Translation Quality Estimation for Gisting Purposes
Raphael Rubino | Jose Guilherme Camargo de Souza | Jennifer Foster | Lucia Specia
Proceedings of Machine Translation Summit XIV: Posters

pdf bib

pdf bib

Parser Accuracy in Quality Estimation of Machine Translation: A Tree Kernel Approach
Rasoul Samad Zadeh Kaljahi | Jennifer Foster | Raphael Rubino | Johann Roturier | Fred Hollowood
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib

An Approach Using Style Classification Features for Quality Estimation
Erwan Moreau | Raphael Rubino
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib

Post-édition statistique pour l’adaptation aux domaines de spécialité en traduction automatique (Statistical Post-Editing of Machine Translation for Domain Adaptation) [in French]
Raphaël Rubino | Stéphane Huet | Fabrice Lefèvre | Georges Linarès
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib

Statistical Post-Editing of Machine Translation for Domain Adaptation
Raphaël Rubino | Stéphane Huet | Fabrice Lefèvre | Georges Linarès
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib

An Evaluation of Statistical Post-Editing Systems Applied to RBMT and SMT Systems
Hanna Béchara | Raphaël Rubino | Yifan He | Yanjun Ma | Josef van Genabith
Proceedings of COLING 2012

pdf bib

Sentence-Level Quality Estimation for MT System Combination
Tsuyoshi Okita | Raphaël Rubino | Josef van Genabith
Proceedings of the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT

pdf bib abs

A Detailed Analysis of Phrase-based and Syntax-based MT: The Search for Systematic Differences
Rasoul Samad Zadeh Kaljahi | Raphael Rubino | Johann Roturier | Jennifer Foster
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging.

pdf bib

2011

pdf bib

2009

pdf bib

Exploring Context Variation and Lexicon Coverage in Projection-based Approach for Term Translation
Raphaël Rubino
Proceedings of the Student Research Workshop