Recycling a Pre-trained BERT Encoder for Neural Machine Translation

Kenji Imamura, Eiichiro Sumita


Abstract
In this paper, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model is applied to Transformer-based neural machine translation (NMT). In contrast to monolingual tasks, the number of unlearned model parameters in an NMT decoder is as huge as the number of learned parameters in the BERT model. To train all the models appropriately, we employ two-stage optimization, which first trains only the unlearned parameters by freezing the BERT model, and then fine-tunes all the sub-models. In our experiments, stable two-stage optimization was achieved, in contrast the BLEU scores of direct fine-tuning were extremely low. Consequently, the BLEU scores of the proposed method were better than those of the Transformer base model and the same model without pre-training. Additionally, we confirmed that NMT with the BERT encoder is more effective in low-resource settings.
Anthology ID:
D19-5603
Volume:
Proceedings of the 3rd Workshop on Neural Generation and Translation
Month:
November
Year:
2019
Address:
Hong Kong
Editors:
Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Ioannis Konstas, Thang Luong, Graham Neubig, Yusuke Oda, Katsuhito Sudoh
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23–31
Language:
URL:
https://aclanthology.org/D19-5603
DOI:
10.18653/v1/D19-5603
Bibkey:
Cite (ACL):
Kenji Imamura and Eiichiro Sumita. 2019. Recycling a Pre-trained BERT Encoder for Neural Machine Translation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 23–31, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Recycling a Pre-trained BERT Encoder for Neural Machine Translation (Imamura & Sumita, NGT 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5603.pdf