Not all parameters are born equal: Attention is mostly what you need

Nikolay Bogoychev

doi:10.18653/v1/2021.blackboxnlp-1.28

Not all parameters are born equal: Attention is mostly what you need

Abstract

Transformers are widely used in state-of-the-art machine translation, but the key to their success is still unknown. To gain insight into this, we consider three groups of parameters: embeddings, attention, and Feed-Forward Neural network (FFN) layers. We examine the relative importance of each by performing an ablation study where we initialise them at random and freeze them, so that their weights do not change over the course of the training. Through this, we show that the attention and FFN are equally important and fulfil the same functionality in a model. We show that the decision about whether a component is frozen or allowed to train is at least as important for the final model performance as its number of parameters. At the same time, the number of parameters alone is not indicative of a component’s importance. Finally, while the embedding layer is the least essential for machine translation tasks, it is the most important component for language modelling tasks.

Anthology ID:: 2021.blackboxnlp-1.28
Volume:: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad
Venue:: BlackboxNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 363–374
Language:
URL:: https://aclanthology.org/2021.blackboxnlp-1.28
DOI:: 10.18653/v1/2021.blackboxnlp-1.28
Bibkey:
Cite (ACL):: Nikolay Bogoychev. 2021. Not all parameters are born equal: Attention is mostly what you need. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 363–374, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Not all parameters are born equal: Attention is mostly what you need (Bogoychev, BlackboxNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.blackboxnlp-1.28.pdf
Software:: 2021.blackboxnlp-1.28.Software.tgz
Code: XapaJIaMnu/marian-dev

PDF Cite Search Code Software