Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Haoran Lian; Junmin Chen; Wei Huang; Yizhe Xiong; Wenping Hu; Guiguang Ding; Hui Chen; Jianwei Niu; Zijia Lin; Fuzheng Zhang; Di Zhang

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Haoran Lian, Junmin Chen, Wei Huang, Yizhe Xiong, Wenping Hu, Guiguang Ding, Hui Chen, Jianwei Niu, Zijia Lin, Fuzheng Zhang, Di Zhang

Abstract

Recently, Large language models (LLMs) have revolutionized Natural Language Processing (NLP). Pretrained LLMs, due to limited training context size, struggle with handling long token sequences, limiting their performance on various downstream tasks. Current solutions toward long context modeling often employ multi-stage continual pertaining, which progressively increases the effective context length through several continual pretraining stages. However, those approaches require extensive manual tuning and human expertise. In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position Embedding (HARPE), to equip LLMs with long context modeling capabilities while simplifying the training process. Our HARPE leverages different Rotary Position Embedding (RoPE) base frequency values across different attention heads and directly trains LLMs on the target context length. Extensive experiments on 4 language modeling benchmarks, including the latest RULER benchmark, demonstrate that HARPE excels in understanding and integrating long-context tasks with single-stage training, matching and even outperforming existing multi-stage methods. Our results highlight that HARPE successfully breaks the stage barrier for training LLMs with long context modeling capabilities.

Anthology ID:: 2025.coling-main.326
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4897–4909
Language:
URL:: https://aclanthology.org/2025.coling-main.326/
DOI:
Bibkey:
Cite (ACL):: Haoran Lian, Junmin Chen, Wei Huang, Yizhe Xiong, Wenping Hu, Guiguang Ding, Hui Chen, Jianwei Niu, Zijia Lin, Fuzheng Zhang, and Di Zhang. 2025. Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4897–4909, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models (Lian et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.326.pdf

PDF Cite Search Fix data