Learned Transformer Position Embeddings Have a Low-Dimensional Structure

Ulme Wennberg, Gustav Henter


Abstract
Position embeddings have long been essential for sequence-order encoding in transformer models, yet their structure is underexplored. This study uses principal component analysis (PCA) to quantitatively compare the dimensionality of absolute position and word embeddings in BERT and ALBERT. We find that, unlike word embeddings, position embeddings occupy a low-dimensional subspace, typically utilizing under 10% of the dimensions available. Additionally, the principal vectors are dominated by a few low-frequency rotational components, a structure arising independently across models.
Anthology ID:
2024.repl4nlp-1.17
Volume:
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Chen Zhao, Marius Mosbach, Pepa Atanasova, Seraphina Goldfarb-Tarrent, Peter Hase, Arian Hosseini, Maha Elbayad, Sandro Pezzelle, Maximilian Mozes
Venues:
RepL4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
237–244
Language:
URL:
https://aclanthology.org/2024.repl4nlp-1.17
DOI:
Bibkey:
Cite (ACL):
Ulme Wennberg and Gustav Henter. 2024. Learned Transformer Position Embeddings Have a Low-Dimensional Structure. In Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024), pages 237–244, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Learned Transformer Position Embeddings Have a Low-Dimensional Structure (Wennberg & Henter, RepL4NLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.repl4nlp-1.17.pdf