Recurrent Attention for the Transformer

Jan Rosendahl, Christian Herold, Frithjof Petrick, Hermann Ney


Abstract
In this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoder-decoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.
Anthology ID:
2021.insights-1.10
Volume:
Proceedings of the Second Workshop on Insights from Negative Results in NLP
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
João Sedoc, Anna Rogers, Anna Rumshisky, Shabnam Tafreshi
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
62–66
Language:
URL:
https://aclanthology.org/2021.insights-1.10
DOI:
10.18653/v1/2021.insights-1.10
Bibkey:
Cite (ACL):
Jan Rosendahl, Christian Herold, Frithjof Petrick, and Hermann Ney. 2021. Recurrent Attention for the Transformer. In Proceedings of the Second Workshop on Insights from Negative Results in NLP, pages 62–66, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Recurrent Attention for the Transformer (Rosendahl et al., insights 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.insights-1.10.pdf
Video:
 https://aclanthology.org/2021.insights-1.10.mp4