FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

Mingyang Song; Mao Zheng; Zheng Li; Wenjie Yang; Xuan Luo

doi:10.18653/v1/2025.findings-emnlp.470

FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models

Mingyang Song, Mao Zheng, Zheng Li, Wenjie Yang, Xuan Luo

Abstract

Improving training efficiency continues to be one of the primary challenges in large-scale Reinforcement Learning (RL). In this paper, we investigate how context length and the complexity of training data influence the RL scaling training process of R1-distilled reasoning models, e.g., DeepSeek-R1-Distill-Qwen-1.5B.Our experimental results reveal that: text-green(1) simply controlling the context length and selecting the training data based on the input prompt length can effectively improve the training efficiency of RL scaling, achieving better performance with more concise CoT; text-blue(2) properly scaling the context length helps mitigate entropy collapse; text-redand (3) carefully choosing the context length facilitates achieving efficient LLM training and reasoning. Inspired by these insights, we propose FastCuRL, a curriculum RL framework with stage-wise context scaling to achieve efficient LLM training and reasoning. Extensive experimental results demonstrate that FastCuRL-1.5B-V3 significantly outperforms state-of-the-art reasoning models on five competition-level benchmarks and achieves 49.6% accuracy on AIME 2024. Furthermore, FastCuRL-1.5B-Preview surpasses DeepScaleR-1.5B-Preview on five benchmarks while only using a single node with 8 GPUs and a total of 50% of training steps.

Anthology ID:: 2025.findings-emnlp.470
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8856–8866
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.470/
DOI:: 10.18653/v1/2025.findings-emnlp.470
Bibkey:
Cite (ACL):: Mingyang Song, Mao Zheng, Zheng Li, Wenjie Yang, and Xuan Luo. 2025. FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 8856–8866, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models (Song et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.470.pdf
Checklist:: 2025.findings-emnlp.470.checklist.pdf

PDF Cite Search Checklist Fix data