Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation

Kenji Imamura, Atsushi Fujita, Eiichiro Sumita


Abstract
A large-scale parallel corpus is required to train encoder-decoder neural machine translation. The method of using synthetic parallel texts, in which target monolingual corpora are automatically translated into source sentences, is effective in improving the decoder, but is unreliable for enhancing the encoder. In this paper, we propose a method that enhances the encoder and attention using target monolingual corpora by generating multiple source sentences via sampling. By using multiple source sentences, diversity close to that of humans is achieved. Our experimental results show that the translation quality is improved by increasing the number of synthetic source sentences for each given target sentence, and quality close to that using a manually created parallel corpus was achieved.
Anthology ID:
W18-2707
Volume:
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Alexandra Birch, Andrew Finch, Thang Luong, Graham Neubig, Yusuke Oda
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–63
Language:
URL:
https://aclanthology.org/W18-2707/
DOI:
10.18653/v1/W18-2707
Bibkey:
Cite (ACL):
Kenji Imamura, Atsushi Fujita, and Eiichiro Sumita. 2018. Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, pages 55–63, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation (Imamura et al., NGT 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-2707.pdf