Quality Analysis of Multilingual Neural Machine Translation Systems and Reference Test Translations for the English-Romanian language pair in the Medical Domain

Miguel Angel Rios Gaona, Raluca-Maria Chereji, Alina Secara, Dragos Ciobanu


Abstract
Multilingual Neural Machine Translation (MNMT) models allow to translate across multiple languages based on only one system. We study the quality of a domain-adapted MNMT model in the medical domain for English-Romanian with automatic metrics and a human error typology annotation based on the Multidimensional Quality Metrics (MQM). We further expand the MQM typology to include terminology-specific error categories. We compare the out-of-domain MNMT with the in-domain adapted MNMT on a standard test dataset of abstracts from medical publications. The in-domain MNMT model outperforms the out-of-domain MNMT in all measured automatic metrics and produces fewer errors. In addition, we perform the manual annotation over the reference test dataset to study the quality of the reference translations. We identify a high number of omissions, additions, and mistranslations in the reference dataset, and comment on the assumed accuracy of existing datasets. Finally, we compare the correlation between the COMET, BERTScore, and chrF automatic metrics with the MQM annotated translations. COMET shows a better correlation with the MQM scores compared to the other metrics.
Anthology ID:
2023.eamt-1.35
Volume:
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Month:
June
Year:
2023
Address:
Tampere, Finland
Editors:
Mary Nurminen, Judith Brenner, Maarit Koponen, Sirkku Latomaa, Mikhail Mikhailov, Frederike Schierl, Tharindu Ranasinghe, Eva Vanmassenhove, Sergi Alvarez Vidal, Nora Aranberri, Mara Nunziatini, Carla Parra Escartín, Mikel Forcada, Maja Popovic, Carolina Scarton, Helena Moniz
Venue:
EAMT
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
355–364
Language:
URL:
https://aclanthology.org/2023.eamt-1.35
DOI:
Bibkey:
Cite (ACL):
Miguel Angel Rios Gaona, Raluca-Maria Chereji, Alina Secara, and Dragos Ciobanu. 2023. Quality Analysis of Multilingual Neural Machine Translation Systems and Reference Test Translations for the English-Romanian language pair in the Medical Domain. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 355–364, Tampere, Finland. European Association for Machine Translation.
Cite (Informal):
Quality Analysis of Multilingual Neural Machine Translation Systems and Reference Test Translations for the English-Romanian language pair in the Medical Domain (Gaona et al., EAMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eamt-1.35.pdf