Word-level Translation Quality Estimation Based on Optimal Transport

Yuto Kuroda; Atsushi Fujita; Tomoyuki Kajiwara

Word-level Translation Quality Estimation Based on Optimal Transport

Yuto Kuroda, Atsushi Fujita, Tomoyuki Kajiwara

Abstract

Word-level translation quality estimation (TQE) is the task of identifying erroneous words in a translation with respect to the source. State-of-the-art methods for TQE exploit large quantities of synthetic training data generated from bilingual parallel corpora, where pseudo-quality labels are determined by comparing two independent translations for the same source text, i.e., an output from a machine translation (MT) system and a reference translation in the parallel corpora. However, this process is sorely reliant on the surface forms of words, with acceptable synonyms and interchangeable word orderings regarded as erroneous. This can potentially mislead the pre-training of models. In this paper, we describe a method that integrates a degree of uncertainty in labeling the words in synthetic training data for TQE. To estimate the extent to which each word in the MT output is likely to be correct or erroneous with respect to the reference translation, we propose to use the concept of optimal transport (OT), which exploits contextual word embeddings. Empirical experiments using a public benchmarking dataset for word-level TQE demonstrate that pre-training TQE models with the pseudo-quality labels determined by OT produces better predictions of the word-level quality labels determined by manual post-editing than doing so with surface-based pseudo-quality labels.

Anthology ID:: 2024.amta-research.18
Volume:: Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Month:: September
Year:: 2024
Address:: Chicago, USA
Editors:: Rebecca Knowles, Akiko Eriguchi, Shivali Goel
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 209–224
Language:
URL:: https://aclanthology.org/2024.amta-research.18/
DOI:
Bibkey:
Cite (ACL):: Yuto Kuroda, Atsushi Fujita, and Tomoyuki Kajiwara. 2024. Word-level Translation Quality Estimation Based on Optimal Transport. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 209–224, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Word-level Translation Quality Estimation Based on Optimal Transport (Kuroda et al., AMTA 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.amta-research.18.pdf

PDF Cite Search Fix data