2023
pdf
bib
abs
Return to the Source: Assessing Machine Translation Suitability
Francesco Fernicola
|
Silvia Bernardini
|
Federico Garcea
|
Adriano Ferraresi
|
Alberto Barrón-Cedeño
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
We approach the task of assessing the suitability of a source text for translation by transferring the knowledge from established MT evaluation metrics to a model able to predict MT quality a priori from the source text alone. To open the door to experiments in this regard, we depart from reference English-German parallel corpora to build a corpus of 14,253 source text-quality score tuples. The tuples include four state-of-the-art metrics: cushLEPOR, BERTScore, COMET, and TransQuest. With this new resource at hand, we fine-tune XLM-RoBERTa, both in a single-task and a multi-task setting, to predict these evaluation scores from the source text alone. Results for this methodology are promising, with the single-task model able to approximate well-established MT evaluation and quality estimation metrics - without looking at the actual machine translations - achieving low RMSE values in the [0.1-0.2] range and Pearson correlation scores up to 0.688.
2019
pdf
bib
MAGMATic: A Multi-domain Academic Gold Standard with Manual Annotation of Terminology for Machine Translation Evaluation
Randy Scansani
|
Luisa Bentivogli
|
Silvia Bernardini
|
Adriano Ferraresi
Proceedings of Machine Translation Summit XVII: Research Track
pdf
bib
Do translator trainees trust machine translation? An experiment on post-editing and revision
Randy Scansani
|
Silvia Bernardini
|
Adriano Ferraresi
|
Luisa Bentivogli
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks
2017
pdf
bib
abs
Enhancing Machine Translation of Academic Course Catalogues with Terminological Resources
Randy Scansani
|
Silvia Bernardini
|
Adriano Ferraresi
|
Federico Gaspari
|
Marcello Soffritti
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology
This paper describes an approach to translating course unit descriptions from Italian and German into English, using a phrase-based machine translation (MT) system. The genre is very prominent among those requiring translation by universities in European countries in which English is a non-native language. For each language combination, an in-domain bilingual corpus including course unit and degree program descriptions is used to train an MT engine, whose output is then compared to a baseline engine trained on the Europarl corpus. In a subsequent experiment, a bilingual terminology database is added to the training sets in both engines and its impact on the output quality is evaluated based on BLEU and post-editing score. Results suggest that the use of domain-specific corpora boosts the engines quality for both language combinations, especially for German-English, whereas adding terminological resources does not seem to bring notable benefits.