Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR

Haobo Xu; Sirui Chen; Ruizhong Qiu; Yuchen Yan; Chen Luo; Monica Xiao Cheng; Jingrui He; Hanghang Tong

Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR

Haobo Xu, Sirui Chen, Ruizhong Qiu, Yuchen Yan, Chen Luo, Monica Xiao Cheng, Jingrui He, Hanghang Tong

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, methods such as GRPO and DAPO suffer from substantial computational cost, since they rely on sampling many rollouts for each prompt. Moreover, in RLVR the relative advantage is often sparse: many samples become nearly all-correct or all-incorrect, yielding low within-group reward variance and thus weak learning signals. In this paper, we introduce ARRoL (**A**ccelerating **R**LV**R** via **o**nline Ro**L**lout Pruning), an online rollout pruning method that prunes rollouts during generation while explicitly steering the surviving ones more correctness-balanced to enhance learning signals. Specifically, ARRoL trains a lightweight quality head on-the-fly to predict the success probability of partial rollouts and uses it to make early pruning decisions. The learned quality head can further weigh candidates to improve inference accuracy during test-time voting. To improve efficiency, we present a system design that prunes rollouts inside the inference engine and re-batches the remaining ones for log-probability computation and policy updates. Across GRPO and DAPO on Qwen-3 and LLaMA-3.2 models (1B-8B), ARRoL improves average accuracy by +2.30 to +2.99 while achieving up to 1.7× training speedup, and yielding up to +8.33 additional gains in average accuracy in test-time voting.

Anthology ID:: 2026.acl-long.632
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13876–13893
Language:
URL:: https://aclanthology.org/2026.acl-long.632/
DOI:
Bibkey:
Cite (ACL):: Haobo Xu, Sirui Chen, Ruizhong Qiu, Yuchen Yan, Chen Luo, Monica Xiao Cheng, Jingrui He, and Hanghang Tong. 2026. Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13876–13893, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR (Xu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.632.pdf
Checklist:: 2026.acl-long.632.checklist.pdf

PDF Cite Search Checklist Fix data