Neural Machine Translation with Decoding History Enhanced Attention

Mingxuan Wang, Jun Xie, Zhixing Tan, Jinsong Su, Deyi Xiong, Chao Bian


Abstract
Neural machine translation with source-side attention have achieved remarkable performance. however, there has been little work exploring to attend to the target-side which can potentially enhance the memory capbility of NMT. We reformulate a Decoding History Enhanced Attention mechanism (DHEA) to render NMT model better at selecting both source-side and target-side information. DHA enables dynamic control of the ratios at which source and target contexts contribute to the generation of target words, offering a way to weakly induce structure relations among both source and target tokens. It also allows training errors to be directly back-propagated through short-cut connections and effectively alleviates the gradient vanishing problem. The empirical study on Chinese-English translation shows that our model with proper configuration can improve by 0:9 BLEU upon Transformer and the best reported results in the dataset. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.
Anthology ID:
C18-1124
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1464–1473
Language:
URL:
https://aclanthology.org/C18-1124
DOI:
Bibkey:
Cite (ACL):
Mingxuan Wang, Jun Xie, Zhixing Tan, Jinsong Su, Deyi Xiong, and Chao Bian. 2018. Neural Machine Translation with Decoding History Enhanced Attention. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1464–1473, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Neural Machine Translation with Decoding History Enhanced Attention (Wang et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1124.pdf