Ana C Farinha


2021

pdf bib
IST-Unbabel 2021 Submission for the Quality Estimation Shared Task
Chrysoula Zerva | Daan van Stigt | Ricardo Rei | Ana C Farinha | Pedro Ramos | José G. C. de Souza | Taisiya Glushkova | Miguel Vera | Fabio Kepler | André F. T. Martins
Proceedings of the Sixth Conference on Machine Translation

We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation. Our team participated on two tasks: Direct Assessment and Post-Editing Effort, encompassing a total of 35 submissions. For all submissions, our efforts focused on training multilingual models on top of OpenKiwi predictor-estimator architecture, using pre-trained multilingual encoders combined with adapters. We further experiment with and uncertainty-related objectives and features as well as training on out-of-domain direct assessment data.

pdf bib
Are References Really Needed? Unbabel-IST 2021 Submission for the Metrics Shared Task
Ricardo Rei | Ana C Farinha | Chrysoula Zerva | Daan van Stigt | Craig Stewart | Pedro Ramos | Taisiya Glushkova | André F. T. Martins | Alon Lavie
Proceedings of the Sixth Conference on Machine Translation

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics Shared Task. With this year’s focus on Multidimensional Quality Metric (MQM) as the ground-truth human assessment, our aim was to steer COMET towards higher correlations with MQM. We do so by first pre-training on Direct Assessments and then fine-tuning on z-normalized MQM scores. In our experiments we also show that reference-free COMET models are becoming competitive with reference-based models, even outperforming the best COMET model from 2020 on this year’s development data. Additionally, we present COMETinho, a lightweight COMET model that is 19x faster on CPU than the original model, while also achieving state-of-the-art correlations with MQM. Finally, in the “QE as a metric” track, we also participated with a QE model trained using the OpenKiwi framework leveraging MQM scores and word-level annotations.

pdf bib
MT-Telescope: An interactive platform for contrastive evaluation of MT systems
Ricardo Rei | Ana C Farinha | Craig Stewart | Luisa Coheur | Alon Lavie
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

We present MT-Telescope, a visualization platform designed to facilitate comparative analysis of the output quality of two Machine Translation (MT) systems. While automated MT evaluation metrics are commonly used to evaluate MT systems at a corpus-level, our platform supports fine-grained segment-level analysis and interactive visualisations that expose the fundamental differences in the performance of the compared systems. MT-Telescope also supports dynamic corpus filtering to enable focused analysis on specific phenomena such as; translation of named entities, handling of terminology, and the impact of input segment length on translation quality. Furthermore, the platform provides a bootstrapped t-test for statistical significance as a means of evaluating the rigor of the resulting system ranking. MT-Telescope is open source, written in Python, and is built around a user friendly and dynamic web interface. Complementing other existing tools, our platform is designed to facilitate and promote the broader adoption of more rigorous analysis practices in the evaluation of MT quality.

2020

pdf bib
COMET: A Neural Framework for MT Evaluation
Ricardo Rei | Craig Stewart | Ana C Farinha | Alon Lavie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metric. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

pdf bib
Unbabel’s Participation in the WMT20 Metrics Shared Task
Ricardo Rei | Craig Stewart | Ana C Farinha | Alon Lavie
Proceedings of the Fifth Conference on Machine Translation

We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segmentlevel, document-level and system-level tracks on all language pairs, as well as the “QE as a Metric” track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: we train several estimator models to regress on different humangenerated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.