Wenjie Yang

2025

pdf bib abs
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models
Mingyang Song | Mao Zheng | Zheng Li | Wenjie Yang | Xuan Luo
Findings of the Association for Computational Linguistics: EMNLP 2025

Improving training efficiency continues to be one of the primary challenges in large-scale Reinforcement Learning (RL). In this paper, we investigate how context length and the complexity of training data influence the RL scaling training process of R1-distilled reasoning models, e.g., DeepSeek-R1-Distill-Qwen-1.5B.Our experimental results reveal that: text-green(1) simply controlling the context length and selecting the training data based on the input prompt length can effectively improve the training efficiency of RL scaling, achieving better performance with more concise CoT; text-blue(2) properly scaling the context length helps mitigate entropy collapse; text-redand (3) carefully choosing the context length facilitates achieving efficient LLM training and reasoning. Inspired by these insights, we propose FastCuRL, a curriculum RL framework with stage-wise context scaling to achieve efficient LLM training and reasoning. Extensive experimental results demonstrate that FastCuRL-1.5B-V3 significantly outperforms state-of-the-art reasoning models on five competition-level benchmarks and achieves 49.6% accuracy on AIME 2024. Furthermore, FastCuRL-1.5B-Preview surpasses DeepScaleR-1.5B-Preview on five benchmarks while only using a single node with 8 GPUs and a total of 50% of training steps.

Large language models have demonstrated considerable capabilities in various mathematical tasks, yet they often fall short in rigorous, proof-based reasoning essential for research-level mathematics. Retrieval-augmented generation presents a promising direction for enhancing these capabilities. This paper systematically explores RAG for natural language theorem proving, revealing that LLMs, when augmented with retrieved proofs rather than just theorems, can function as potent mimetic theorem provers: these models can effectively generalize proof techniques found in unstructured retrieved contexts to construct correct proofs for novel theorems. Building upon this finding, we introduce Dual RAG, a simple yet effective RAG framework. Dual RAG employs LLMs to identify underlying reasoning challenges within theorems, augmenting both queries and document contexts to improve retrieval performance. Our experiments show that Dual RAG achieves substantial improvements in retrieval performance, with gains of up to 34.19%. Expert evaluations further confirm that these retrieval enhancements directly translate into higher quality proof generation. Notably, when integrated with the arXiv API, Dual RAG demonstrates the ability to prove research-level theorems in theoretical machine learning, highlighting its strong potential as a foundational element for a practical mathematical copilot.

Co-authors

Venues

findings2

Fix author