Return to the Source: Assessing Machine Translation Suitability

Francesco Fernicola; Silvia Bernardini; Federico Garcea; Adriano Ferraresi; Alberto Barrón-Cedeño

Return to the Source: Assessing Machine Translation Suitability

Francesco Fernicola, Silvia Bernardini, Federico Garcea, Adriano Ferraresi, Alberto Barrón-Cedeño

Abstract

We approach the task of assessing the suitability of a source text for translation by transferring the knowledge from established MT evaluation metrics to a model able to predict MT quality a priori from the source text alone. To open the door to experiments in this regard, we depart from reference English-German parallel corpora to build a corpus of 14,253 source text-quality score tuples. The tuples include four state-of-the-art metrics: cushLEPOR, BERTScore, COMET, and TransQuest. With this new resource at hand, we fine-tune XLM-RoBERTa, both in a single-task and a multi-task setting, to predict these evaluation scores from the source text alone. Results for this methodology are promising, with the single-task model able to approximate well-established MT evaluation and quality estimation metrics - without looking at the actual machine translations - achieving low RMSE values in the [0.1-0.2] range and Pearson correlation scores up to 0.688.

Anthology ID:: 2023.eamt-1.9
Volume:: Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Month:: June
Year:: 2023
Address:: Tampere, Finland
Editors:: Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
Venue:: EAMT
SIG:
Publisher:: European Association for Machine Translation
Note:
Pages:: 79–89
Language:
URL:: https://aclanthology.org/2023.eamt-1.9/
DOI:
Bibkey:
Cite (ACL):: Francesco Fernicola, Silvia Bernardini, Federico Garcea, Adriano Ferraresi, and Alberto Barrón-Cedeño. 2023. Return to the Source: Assessing Machine Translation Suitability. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 79–89, Tampere, Finland. European Association for Machine Translation.
Cite (Informal):: Return to the Source: Assessing Machine Translation Suitability (Fernicola et al., EAMT 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.eamt-1.9.pdf

PDF Cite Search Fix data