A Test Suite for Evaluating Discourse Phenomena in Document-level Neural Machine Translation

Xinyi Cai, Deyi Xiong


Abstract
The need to evaluate the ability of context-aware neural machine translation (NMT) models in dealing with specific discourse phenomena arises in document-level NMT. However, test sets that satisfy this need are rare. In this paper, we propose a test suite to evaluate three common discourse phenomena in English-Chinese translation: pronoun, discourse connective and ellipsis where discourse divergences lie across the two languages. The test suite contains 1,200 instances, 400 for each type of discourse phenomena. We perform both automatic and human evaluation with three state-of-the-art context-aware NMT models on the proposed test suite. Results suggest that our test suite can be used as a challenging benchmark test bed for evaluating document-level NMT. The test suite will be publicly available soon.
Anthology ID:
2020.iwdp-1.3
Volume:
Proceedings of the Second International Workshop of Discourse Processing
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Qun Liu, Deyi Xiong, Shili Ge, Xiaojun Zhang
Venue:
iwdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13–17
Language:
URL:
https://aclanthology.org/2020.iwdp-1.3
DOI:
Bibkey:
Cite (ACL):
Xinyi Cai and Deyi Xiong. 2020. A Test Suite for Evaluating Discourse Phenomena in Document-level Neural Machine Translation. In Proceedings of the Second International Workshop of Discourse Processing, pages 13–17, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
A Test Suite for Evaluating Discourse Phenomena in Document-level Neural Machine Translation (Cai & Xiong, iwdp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.iwdp-1.3.pdf
Data
OpenSubtitles