How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Timothee Mickus; Denis Paperno; Matthieu Constant

doi:10.1162/tacl_a_00501

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Timothee Mickus, Denis Paperno, Mathieu Constant

Abstract

Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.

Anthology ID:: 2022.tacl-1.57
Volume:: Transactions of the Association for Computational Linguistics, Volume 10
Month:
Year:: 2022
Address:: Cambridge, MA
Editors:: Brian Roark, Ani Nenkova
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 981–996
Language:
URL:: https://aclanthology.org/2022.tacl-1.57/
DOI:: 10.1162/tacl_a_00501
Bibkey:
Cite (ACL):: Timothee Mickus, Denis Paperno, and Mathieu Constant. 2022. How to Dissect a Muppet: The Structure of Transformer Embedding Spaces. Transactions of the Association for Computational Linguistics, 10:981–996.
Cite (Informal):: How to Dissect a Muppet: The Structure of Transformer Embedding Spaces (Mickus et al., TACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.tacl-1.57.pdf
Video:: https://aclanthology.org/2022.tacl-1.57.mp4

PDF Cite Search Video Fix data