Cross-Lingual Transformers for Neural Automatic Post-Editing

Dongjun Lee


Abstract
In this paper, we describe the Bering Lab’s submission to the WMT 2020 Shared Task on Automatic Post-Editing (APE). First, we propose a cross-lingual Transformer architecture that takes a concatenation of a source sentence and a machine-translated (MT) sentence as an input to generate the post-edited (PE) output. For further improvement, we mask incorrect or missing words in the PE output based on word-level quality estimation and then predict the actual word for each mask based on the fine-tuned cross-lingual language model (XLM-RoBERTa). Finally, to address the over-correction problem, we select the final output among the PE outputs and the original MT sentence based on a sentence-level quality estimation. When evaluated on the WMT 2020 English-German APE test dataset, our system improves the NMT output by -3.95 and +4.50 in terms of TER and BLEU, respectively.
Anthology ID:
2020.wmt-1.81
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
772–776
Language:
URL:
https://aclanthology.org/2020.wmt-1.81
DOI:
Bibkey:
Cite (ACL):
Dongjun Lee. 2020. Cross-Lingual Transformers for Neural Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation, pages 772–776, Online. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Transformers for Neural Automatic Post-Editing (Lee, WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.81.pdf
Video:
 https://slideslive.com/38939547