On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Xuebo Liu; Longyue Wang; Derek F. Wong; Liang Ding; Lidia S. Chao; Shuming Shi; Zhaopeng Tu

doi:10.18653/v1/2021.findings-emnlp.247

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

Abstract

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT). This paper takes the first step to investigate the complementarity between PT and BT. We introduce two probing tasks for PT and BT respectively and find that PT mainly contributes to the encoder module while BT brings more benefits to the decoder. Experimental results show that PT and BT are nicely complementary to each other, establishing state-of-the-art performances on the WMT16 English-Romanian and English-Russian benchmarks. Through extensive analyses on sentence originality and word frequency, we also demonstrate that combining Tagged BT with PT is more helpful to their complementarity, leading to better translation quality. Source code is freely available at https://github.com/SunbowLiu/PTvsBT.

Anthology ID:: 2021.findings-emnlp.247
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2900–2907
Language:
URL:: https://aclanthology.org/2021.findings-emnlp.247
DOI:: 10.18653/v1/2021.findings-emnlp.247
Bibkey:
Cite (ACL):: Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, and Zhaopeng Tu. 2021. On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2900–2907, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Liu et al., Findings 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.findings-emnlp.247.pdf
Video:: https://aclanthology.org/2021.findings-emnlp.247.mp4
Code: sunbowliu/ptvsbt

PDF Cite Search Code Video