Statistical Power and Translationese in Machine Translation Evaluation

Yvette Graham; Barry Haddow; Philipp Koehn

doi:10.18653/v1/2020.emnlp-main.6

Statistical Power and Translationese in Machine Translation Evaluation

Yvette Graham, Barry Haddow, Philipp Koehn

Abstract

The term translationese has been used to describe features of translated text, and in this paper, we provide detailed analysis of potential adverse effects of translationese on machine translation evaluation. Our analysis shows differences in conclusions drawn from evaluations that include translationese in test data compared to experiments that tested only with text originally composed in that language. For this reason we recommend that reverse-created test data be omitted from future machine translation test sets. In addition, we provide a re-evaluation of a past machine translation evaluation claiming human-parity of MT. One important issue not previously considered is statistical power of significance tests applied to comparison of human and machine translation. Since the very aim of past evaluations was investigation of ties between human and MT systems, power analysis is of particular importance, to avoid, for example, claims of human parity simply corresponding to Type II error resulting from the application of a low powered test. We provide detailed analysis of tests used in such evaluations to provide an indication of a suitable minimum sample size for future studies.

Anthology ID:: 2020.emnlp-main.6
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 72–81
Language:
URL:: https://aclanthology.org/2020.emnlp-main.6/
DOI:: 10.18653/v1/2020.emnlp-main.6
Bibkey:
Cite (ACL):: Yvette Graham, Barry Haddow, and Philipp Koehn. 2020. Statistical Power and Translationese in Machine Translation Evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 72–81, Online. Association for Computational Linguistics.
Cite (Informal):: Statistical Power and Translationese in Machine Translation Evaluation (Graham et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.6.pdf
Video:: https://slideslive.com/38938740

PDF Cite Search Video Fix data