Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings Ta-Chung Chi author Ting-Han Fan author Li-Wei Chen author Alexander Rudnicky author Peter Ramadge author 2023-07 text Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Anna Rogers editor Jordan Boyd-Graber editor Naoaki Okazaki editor Association for Computational Linguistics Toronto, Canada conference publication chi-etal-2023-latent 10.18653/v1/2023.acl-short.102 https://aclanthology.org/2023.acl-short.102/ 2023-07 1183 1193