Benchmarking the Performance of Machine Translation Evaluation Metrics with Chinese Multiword Expressions

Huacheng Song; Hongzhi Xu

Benchmarking the Performance of Machine Translation Evaluation Metrics with Chinese Multiword Expressions

Abstract

To investigate the impact of Multiword Expressions (MWEs) on the fine-grained performance of the state-of-the-art metrics for Machine Translation Evaluation (MTE), we conduct experiments on the WMT22 Metrics Shared Task dataset with a preliminary focus on the Chinese-to-English language pair. We further annotate 28 types of Chinese MWEs on the source texts and then examine the performance of 31 MTE metrics on groups of sentences containing different MWEs. We have 3 interesting findings: 1) Machine Translation (MT) systems tend to perform worse on most Chinese MWE categories, confirming the previous claim that MWEs are a bottleneck of MT; 2) automatic metrics tend to overrate the translation of sentences containing MWEs; 3) most neural-network-based metrics perform better than string-overlap-based metrics. It concludes that both MT systems and MTE metrics still suffer from MWEs, suggesting richer annotation of data to facilitate MWE-aware automatic MTE and MT.

Anthology ID:: 2024.lrec-main.198
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 2204–2216
Language:
URL:: https://aclanthology.org/2024.lrec-main.198/
DOI:
Bibkey:
Cite (ACL):: Huacheng Song and Hongzhi Xu. 2024. Benchmarking the Performance of Machine Translation Evaluation Metrics with Chinese Multiword Expressions. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2204–2216, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Benchmarking the Performance of Machine Translation Evaluation Metrics with Chinese Multiword Expressions (Song & Xu, LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.198.pdf

PDF Cite Search Fix data