Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Samuel Läubli, Rico Sennrich, Martin Volk


Abstract
Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.
Anthology ID:
D18-1512
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4791–4796
Language:
URL:
https://aclanthology.org/D18-1512
DOI:
10.18653/v1/D18-1512
Bibkey:
Cite (ACL):
Samuel Läubli, Rico Sennrich, and Martin Volk. 2018. Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4791–4796, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation (Läubli et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1512.pdf
Attachment:
 D18-1512.Attachment.pdf
Video:
 https://aclanthology.org/D18-1512.mp4
Code
 laeubli/parity