Key-value Attention Mechanism for Neural Machine Translation

Hideya Mino, Masao Utiyama, Eiichiro Sumita, Takenobu Tokunaga


Abstract
In this paper, we propose a neural machine translation (NMT) with a key-value attention mechanism on the source-side encoder. The key-value attention mechanism separates the source-side content vector into two types of memory known as the key and the value. The key is used for calculating the attention distribution, and the value is used for encoding the context representation. Experiments on three different tasks indicate that our model outperforms an NMT model with a conventional attention mechanism. Furthermore, we perform experiments with a conventional NMT framework, in which a part of the initial value of a weight matrix is set to zero so that the matrix is as the same initial-state as the key-value attention mechanism. As a result, we obtain comparable results with the key-value attention mechanism without changing the network structure.
Anthology ID:
I17-2049
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
290–295
Language:
URL:
https://aclanthology.org/I17-2049
DOI:
Bibkey:
Cite (ACL):
Hideya Mino, Masao Utiyama, Eiichiro Sumita, and Takenobu Tokunaga. 2017. Key-value Attention Mechanism for Neural Machine Translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 290–295, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Key-value Attention Mechanism for Neural Machine Translation (Mino et al., IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-2049.pdf
Data
ASPEC