Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

Vilém Zouhar; Shuoyang Ding; Anna Currey; Tatyana Badeka; Jenyuan Wang; Brian Thompson

doi:10.18653/v1/2024.acl-short.45

Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains

Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, Brian Thompson

Abstract

We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain. We use this dataset to investigate whether machine translation (MT) metrics which are fine-tuned on human-generated MT quality judgements are robust to domain shifts between training and inference. We find that fine-tuned metrics exhibit a substantial performance drop in the unseen domain scenario relative to both metrics that rely on the surface form and pre-trained metrics that are not fine-tuned on MT quality judgments.

Anthology ID:: 2024.acl-short.45
Original:: 2024.acl-short.45v1
Version 2:: 2024.acl-short.45v2
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 488–500
Language:
URL:: https://aclanthology.org/2024.acl-short.45/
DOI:: 10.18653/v1/2024.acl-short.45
Bibkey:
Cite (ACL):: Vilém Zouhar, Shuoyang Ding, Anna Currey, Tatyana Badeka, Jenyuan Wang, and Brian Thompson. 2024. Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 488–500, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Fine-Tuned Machine Translation Metrics Struggle in Unseen Domains (Zouhar et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-short.45.pdf

PDF (v2) PDF (v1) Cite Search Fix data