Iterative Constrained Back-Translation for Unsupervised Domain Adaptation of Machine Translation

Hongxiao Zhang, Hui Huang, Jiale Gao, Yufeng Chen, Jinan Xu, Jian Liu


Abstract
Back-translation has been proven to be effective in unsupervised domain adaptation of neural machine translation (NMT). However, the existing back-translation methods mainly improve domain adaptability by generating in-domain pseudo-parallel data that contains sentence-structural knowledge, paying less attention to the in-domain lexical knowledge, which may lead to poor translation of unseen in-domain words. In this paper, we propose an Iterative Constrained Back-Translation (ICBT) method to incorporate in-domain lexical knowledge on the basis of BT for unsupervised domain adaptation of NMT. Specifically, we apply lexical constraints into back-translation to generate pseudo-parallel data with in-domain lexical knowledge, and then perform round-trip iterations to incorporate more lexical knowledge. Based on this, we further explore sampling strategies of constrained words in ICBT to introduce more targeted lexical knowledge, via domain specificity and confidence estimation. Experimental results on four domains show that our approach achieves state-of-the-art results, improving the BLEU score by up to 3.08 compared to the strongest baseline, which demonstrates the effectiveness of our approach.
Anthology ID:
2022.coling-1.448
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5054–5065
Language:
URL:
https://aclanthology.org/2022.coling-1.448
DOI:
Bibkey:
Cite (ACL):
Hongxiao Zhang, Hui Huang, Jiale Gao, Yufeng Chen, Jinan Xu, and Jian Liu. 2022. Iterative Constrained Back-Translation for Unsupervised Domain Adaptation of Machine Translation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5054–5065, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Iterative Constrained Back-Translation for Unsupervised Domain Adaptation of Machine Translation (Zhang et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.448.pdf
Code
 zzzxiaohong/icbt