Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

Hongfei Xu, Josef van Genabith, Qiuhui Liu, Deyi Xiong


Abstract
Due to its effectiveness and performance, the Transformer translation model has attracted wide attention, most recently in terms of probing-based approaches. Previous work focuses on using or probing source linguistic features in the encoder. To date, the way word translation evolves in Transformer layers has not yet been investigated. Naively, one might assume that encoder layers capture source information while decoder layers translate. In this work, we show that this is not quite the case: translation already happens progressively in encoder layers and even in the input embeddings. More surprisingly, we find that some of the lower decoder layers do not actually do that much decoding. We show all of this in terms of a probing approach where we project representations of the layer analyzed to the final trained and frozen classifier level of the Transformer decoder to measure word translation accuracy. Our findings motivate and explain a Transformer configuration change: if translation already happens in the encoder layers, perhaps we can increase the number of encoder layers, while decreasing the number of decoder layers, boosting decoding speed, without loss in translation quality? Our experiments show that this is indeed the case: we can increase speed by up to a factor 2.3 with small gains in translation quality, while an 18-4 deep encoder configuration boosts translation quality by +1.42 BLEU (En-De) at a speed-up of 1.4.
Anthology ID:
2021.naacl-main.7
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–85
Language:
URL:
https://aclanthology.org/2021.naacl-main.7
DOI:
10.18653/v1/2021.naacl-main.7
Bibkey:
Cite (ACL):
Hongfei Xu, Josef van Genabith, Qiuhui Liu, and Deyi Xiong. 2021. Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 74–85, Online. Association for Computational Linguistics.
Cite (Informal):
Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers (Xu et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.7.pdf
Video:
 https://aclanthology.org/2021.naacl-main.7.mp4