Dependency-Based Self-Attention for Transformer NMT

Hiroyuki Deguchi, Akihiro Tamura, Takashi Ninomiya


Abstract
In this paper, we propose a new Transformer neural machine translation (NMT) model that incorporates dependency relations into self-attention on both source and target sides, dependency-based self-attention. The dependency-based self-attention is trained to attend to the modifiee for each token under constraints based on the dependency relations, inspired by Linguistically-Informed Self-Attention (LISA). While LISA is originally proposed for Transformer encoder for semantic role labeling, this paper extends LISA to Transformer NMT by masking future information on words in the decoder-side dependency-based self-attention. Additionally, our dependency-based self-attention operates at sub-word units created by byte pair encoding. The experiments show that our model improves 1.0 BLEU points over the baseline model on the WAT’18 Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task.
Anthology ID:
R19-1028
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
239–246
Language:
URL:
https://aclanthology.org/R19-1028
DOI:
10.26615/978-954-452-056-4_028
Bibkey:
Cite (ACL):
Hiroyuki Deguchi, Akihiro Tamura, and Takashi Ninomiya. 2019. Dependency-Based Self-Attention for Transformer NMT. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 239–246, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Dependency-Based Self-Attention for Transformer NMT (Deguchi et al., RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1028.pdf