Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling

Seyyed Saeid Cheshmi; Azal Ahmad Khan; Xinran Wang; Zirui Liu; Ali Anwar

Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling

Seyyed Saeid Cheshmi, Azal Ahmad Khan, Xinran Wang, Zirui Liu, Ali Anwar

Abstract

Large Language Models (LLMs) are increasingly relied upon for solving complex reasoning tasks in domains such as mathematics, logic, and multi-step question answering. A growing line of work seeks to improve reasoning quality by scaling inference time compute particularly through Process Reward Models (PRMs), used to reward the reasoning at intermediate steps. While effective, these methods introduce substantial computational overhead, especially when generating large numbers of solutions in parallel. In this paper, we investigate whether PRMs can be used mid-generation to provide early signals that enable the rejection of suboptimal candidates before full generation of step is complete. We introduce the hypothesis that PRMs are also Partial Reward Models, meaning that the scores they assign to partially completed reasoning step are predictive of final output quality. This allows for principled early rejection based on intermediate token-level signals. We support this hypothesis both theoretically, by proving that the risk of discarding optimal beams decreases exponentially with generation length and empirically, by demonstrating a strong correlation between partial and final rewards across multiple reward models. On math reasoning benchmarks, our method achieves up to 1.4 × – 9 × reduction in inference FLOPs without degrading final performance. These results suggest that early rejection is a powerful mechanism for improving the compute-efficiency of reasoning in LLMs.

Anthology ID:: 2025.findings-emnlp.551
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10433–10447
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.551/
DOI:
Bibkey:
Cite (ACL):: Seyyed Saeid Cheshmi, Azal Ahmad Khan, Xinran Wang, Zirui Liu, and Ali Anwar. 2025. Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10433–10447, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling (Cheshmi et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.551.pdf
Checklist:: 2025.findings-emnlp.551.checklist.pdf

PDF Cite Search Checklist Fix data