Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET

Chantal Amrhein; Rico Sennrich

doi:10.18653/v1/2022.aacl-main.83

Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET

Abstract

Neural metrics have achieved impressive correlation with human judgements in the evaluation of machine translation systems, but before we can safely optimise towards such metrics, we should be aware of (and ideally eliminate) biases toward bad translations that receive high scores. Our experiments show that sample-based Minimum Bayes Risk decoding can be used to explore and quantify such weaknesses. When applying this strategy to COMET for en-de and de-en, we find that COMET models are not sensitive enough to discrepancies in numbers and named entities. We further show that these biases are hard to fully remove by simply training on additional synthetic data and release our code and data for facilitating further experiments.

Anthology ID:: 2022.aacl-main.83
Volume:: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: November
Year:: 2022
Address:: Online only
Editors:: Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:: AACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1125–1141
Language:
URL:: https://aclanthology.org/2022.aacl-main.83/
DOI:: 10.18653/v1/2022.aacl-main.83
Bibkey:
Cite (ACL):: Chantal Amrhein and Rico Sennrich. 2022. Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1125–1141, Online only. Association for Computational Linguistics.
Cite (Informal):: Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET (Amrhein & Sennrich, AACL-IJCNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.aacl-main.83.pdf

PDF Cite Search Fix data