Norm of Mean Contextualized Embeddings Determines their Variance

Hiroaki Yamagiwa, Hidetoshi Shimodaira


Abstract
Contextualized embeddings vary by context, even for the same token, and form a distribution in the embedding space. To analyze this distribution, we focus on the norm of the mean embedding and the variance of the embeddings. In this study, we first demonstrate that these values follow the well-known formula for variance in statistics and provide an efficient sequential computation method. Then, by observing embeddings from intermediate layers of several Transformer models, we found a strong trade-off relationship between the norm and the variance: as the mean embedding becomes closer to the origin, the variance increases. Furthermore, when the sets of token embeddings are treated as clusters, we show that the variance of the entire embedding set can theoretically be decomposed into the within-cluster variance and the between-cluster variance. We found experimentally that as the layers of Transformer models deepen, the embeddings move farther from the origin, the between-cluster variance relatively decreases, and the within-cluster variance relatively increases. These results are consistent with existing studies on the anisotropy of the embedding spaces across layers.
Anthology ID:
2025.coling-main.521
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7778–7808
Language:
URL:
https://aclanthology.org/2025.coling-main.521/
DOI:
Bibkey:
Cite (ACL):
Hiroaki Yamagiwa and Hidetoshi Shimodaira. 2025. Norm of Mean Contextualized Embeddings Determines their Variance. In Proceedings of the 31st International Conference on Computational Linguistics, pages 7778–7808, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Norm of Mean Contextualized Embeddings Determines their Variance (Yamagiwa & Shimodaira, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.521.pdf