Assessing Reference-Free Peer Evaluation for Machine Translation

Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry


Abstract
Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains. It has been recently shown that the probabilities given by a large, multilingual model can achieve state of the art results when used as a reference-free metric. We experiment with various modifications to this model, and demonstrate that by scaling it up we can match the performance of BLEU. We analyze various potential weaknesses of the approach, and find that it is surprisingly robust and likely to offer reasonable performance across a broad spectrum of domains and different system qualities.
Anthology ID:
2021.naacl-main.91
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1158–1171
Language:
URL:
https://aclanthology.org/2021.naacl-main.91
DOI:
10.18653/v1/2021.naacl-main.91
Bibkey:
Cite (ACL):
Sweta Agrawal, George Foster, Markus Freitag, and Colin Cherry. 2021. Assessing Reference-Free Peer Evaluation for Machine Translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1158–1171, Online. Association for Computational Linguistics.
Cite (Informal):
Assessing Reference-Free Peer Evaluation for Machine Translation (Agrawal et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.91.pdf
Video:
 https://aclanthology.org/2021.naacl-main.91.mp4
Data
WMT19 Metrics Task