Rethinking the Word-level Quality Estimation for Machine Translation from Human Judgement

Zhen Yang, Fandong Meng, Yuanmeng Yan, Jie Zhou


Abstract
Word-level Quality Estimation (QE) of Machine Translation (MT) aims to detect potential translation errors in the translated sentence without reference. Typically, conventional works on word-level QE are usually designed to predict the quality of translated words in terms of the post-editing effort, where the word labels in the dataset, i.e., OK or BAD, are automatically generated by comparing words between MT sentences and the post-edited sentences through a Translation Error Rate (TER) toolkit. While the post-editing effort can be used to measure the translation quality to some extent, we find it usually conflicts with human judgment on whether the word is well or poorly translated. To investigate this conflict, we first create a golden benchmark dataset, namely HJQE (Human Judgement on Quality Estimation), where the source and MT sentences are identical to the original TER-based dataset and the expert translators directly annotate the poorly translated words on their judgments. Based on our analysis, we further propose two tag-correcting strategies which can make the TER-based artificial QE corpus closer to HJQE. We conduct substantial experiments based on the publicly available WMT En-De and En-Zh corpora. The results not only show our proposed dataset is more consistent with human judgment but also confirm the effectiveness of the proposed tag-correcting strategies.For reviewers, the corpora and codes can be found in the attached files.
Anthology ID:
2023.findings-acl.126
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2012–2025
Language:
URL:
https://aclanthology.org/2023.findings-acl.126
DOI:
10.18653/v1/2023.findings-acl.126
Bibkey:
Cite (ACL):
Zhen Yang, Fandong Meng, Yuanmeng Yan, and Jie Zhou. 2023. Rethinking the Word-level Quality Estimation for Machine Translation from Human Judgement. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2012–2025, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Rethinking the Word-level Quality Estimation for Machine Translation from Human Judgement (Yang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.126.pdf