The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov


Abstract
In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. This fact is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
Anthology ID:
2024.findings-eacl.58
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
868–874
Language:
URL:
https://aclanthology.org/2024.findings-eacl.58
DOI:
Bibkey:
Cite (ACL):
Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, and Andrey Kuznetsov. 2024. The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models. In Findings of the Association for Computational Linguistics: EACL 2024, pages 868–874, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models (Razzhigaev et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-eacl.58.pdf