Recurrent Attention for Neural Machine Translation

Jiali Zeng, Shuangzhi Wu, Yongjing Yin, Yufan Jiang, Mu Li


Abstract
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.
Anthology ID:
2021.emnlp-main.258
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3216–3225
Language:
URL:
https://aclanthology.org/2021.emnlp-main.258
DOI:
10.18653/v1/2021.emnlp-main.258
Bibkey:
Cite (ACL):
Jiali Zeng, Shuangzhi Wu, Yongjing Yin, Yufan Jiang, and Mu Li. 2021. Recurrent Attention for Neural Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3216–3225, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Recurrent Attention for Neural Machine Translation (Zeng et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.258.pdf
Code
 lemon0830/ran