A Closer Look at Transformer Attention for Multilingual Translation

Jingyi Zhang, Gerard de Melo, Hongfei Xu, Kehai Chen


Abstract
Transformers are the predominant model for machine translation. Recent works also showed that a single Transformer model can be trained to learn translation for multiple different language pairs, achieving promising results. In this work, we investigate how the multilingual Transformer model pays attention for translating different language pairs. We first performed automatic pruning to eliminate a large number of noisy heads and then analyzed the functions and behaviors of the remaining heads in both self-attention and cross-attention. We find that different language pairs, in spite of having different syntax and word orders, tended to share the same heads for the same functions, such as syntax heads and reordering heads. However, the different characteristics of different language pairs clearly caused interference in function heads and affected head accuracies. Additionally, we reveal an interesting behavior of the Transformer cross-attention: the deep-layer cross-attention heads work in a clear cooperative way to learn different options for word reordering, which can be caused by the nature of translation tasks having multiple different gold translations in the target language for the same source sentence.
Anthology ID:
2023.wmt-1.45
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
496–506
Language:
URL:
https://aclanthology.org/2023.wmt-1.45
DOI:
10.18653/v1/2023.wmt-1.45
Bibkey:
Cite (ACL):
Jingyi Zhang, Gerard de Melo, Hongfei Xu, and Kehai Chen. 2023. A Closer Look at Transformer Attention for Multilingual Translation. In Proceedings of the Eighth Conference on Machine Translation, pages 496–506, Singapore. Association for Computational Linguistics.
Cite (Informal):
A Closer Look at Transformer Attention for Multilingual Translation (Zhang et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.45.pdf