Gustav Henter


2024

pdf bib
Learned Transformer Position Embeddings Have a Low-Dimensional Structure
Ulme Wennberg | Gustav Henter
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

Position embeddings have long been essential for sequence-order encoding in transformer models, yet their structure is underexplored. This study uses principal component analysis (PCA) to quantitatively compare the dimensionality of absolute position and word embeddings in BERT and ALBERT. We find that, unlike word embeddings, position embeddings occupy a low-dimensional subspace, typically utilizing under 10% of the dimensions available. Additionally, the principal vectors are dominated by a few low-frequency rotational components, a structure arising independently across models.