Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu, Yan Gao, Yao Hu, Kehai Chen, Min Zhang


Abstract
Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, however, few of them have attempted to showcase their inner workings comprehensively. In this paper, we are driven to offer a straightforward yet in-depth understanding of RoPE extensions from an attention perspective and on two benchmarking tasks. A broad array of experiments reveals several valuable findings: 1) Maintaining attention patterns to those at the pretrained length improves extrapolation; 2) Large attention uncertainty leads to retrieval errors; 3) Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation.
Anthology ID:
2025.coling-main.600
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8955–8962
Language:
URL:
https://aclanthology.org/2025.coling-main.600/
DOI:
Bibkey:
Cite (ACL):
Meizhi Zhong, Chen Zhang, Yikun Lei, Xikai Liu, Yan Gao, Yao Hu, Kehai Chen, and Min Zhang. 2025. Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8955–8962, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective (Zhong et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.600.pdf