On the Sub-layer Functionalities of Transformer Decoder

Yilin Yang; Longyue Wang; Shuming Shi; Prasad Tadepalli; Stefan Lee; Zhaopeng Tu

doi:10.18653/v1/2020.findings-emnlp.432

On the Sub-layer Functionalities of Transformer Decoder

Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, Zhaopeng Tu

Abstract

There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages – developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance – a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.

Anthology ID:: 2020.findings-emnlp.432
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4799–4811
Language:
URL:: https://aclanthology.org/2020.findings-emnlp.432/
DOI:: 10.18653/v1/2020.findings-emnlp.432
Bibkey:
Cite (ACL):: Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, and Zhaopeng Tu. 2020. On the Sub-layer Functionalities of Transformer Decoder. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4799–4811, Online. Association for Computational Linguistics.
Cite (Informal):: On the Sub-layer Functionalities of Transformer Decoder (Yang et al., Findings 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.findings-emnlp.432.pdf
Video:: https://slideslive.com/38940141

PDF Cite Search Video Fix data