Learning to Copy for Automatic Post-Editing

Xuancheng Huang, Yang Liu, Huanbo Luan, Jingfang Xu, Maosong Sun


Abstract
Automatic post-editing (APE), which aims to correct errors in the output of machine translation systems in a post-processing step, is an important task in natural language processing. While recent work has achieved considerable performance gains by using neural networks, how to model the copying mechanism for APE remains a challenge. In this work, we propose a new method for modeling copying for APE. To better identify translation errors, our method learns the representations of source sentences and system outputs in an interactive way. These representations are used to explicitly indicate which words in the system outputs should be copied. Finally, CopyNet (Gu et.al., 2016) can be combined with our method to place the copied words in correct positions in post-edited translations. Experiments on the datasets of the WMT 2016-2017 APE shared tasks show that our approach outperforms all best published results.
Anthology ID:
D19-1634
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
6122–6132
Language:
URL:
https://aclanthology.org/D19-1634
DOI:
10.18653/v1/D19-1634
Bibkey:
Cite (ACL):
Xuancheng Huang, Yang Liu, Huanbo Luan, Jingfang Xu, and Maosong Sun. 2019. Learning to Copy for Automatic Post-Editing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6122–6132, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Learning to Copy for Automatic Post-Editing (Huang et al., EMNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1634.pdf
Code
 THUNLP-MT/THUMT +  additional community code
Data
eSCAPE