Xinran Wang
2025
Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling
Seyyed Saeid Cheshmi
|
Azal Ahmad Khan
|
Xinran Wang
|
Zirui Liu
|
Ali Anwar
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) are increasingly relied upon for solving complex reasoning tasks in domains such as mathematics, logic, and multi-step question answering. A growing line of work seeks to improve reasoning quality by scaling inference time compute particularly through Process Reward Models (PRMs), used to reward the reasoning at intermediate steps. While effective, these methods introduce substantial computational overhead, especially when generating large numbers of solutions in parallel. In this paper, we investigate whether PRMs can be used mid-generation to provide early signals that enable the rejection of suboptimal candidates before full generation of step is complete. We introduce the hypothesis that PRMs are also Partial Reward Models, meaning that the scores they assign to partially completed reasoning step are predictive of final output quality. This allows for principled early rejection based on intermediate token-level signals. We support this hypothesis both theoretically, by proving that the risk of discarding optimal beams decreases exponentially with generation length and empirically, by demonstrating a strong correlation between partial and final rewards across multiple reward models. On math reasoning benchmarks, our method achieves up to 1.4 × – 9 × reduction in inference FLOPs without degrading final performance. These results suggest that early rejection is a powerful mechanism for improving the compute-efficiency of reasoning in LLMs.
AID: Adaptive Integration of Detectors for Safe AI with Language Models
Xinran Wang
|
Enmao Diao
|
Qi Le
|
Jie Ding
|
Ali Anwar
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
As Large Language Models (LLMs) increasingly influence content generation across diverse platforms, there is a heightened urgency to regulate their outputs to ensure safe usage. However, defining safety is complex, given that entities across domains may interpret it through varied lenses and develop safety detectors—models trained to identify specific unsafe content based on predefined criteria. To address this complexity, we introduce the approach of Adaptive Integration of Detectors (AID) to orchestrate the strengths of multiple pretrained detectors to ensure comprehensive effectiveness in diverse scenarios. AID employs a Mixture-of-Experts (MoE) framework, wherein it dynamically assigns and learns data-adaptive weights for each detector using domain-specific annotated data and LLM-extracted features. We provide theoretical insights into why MoE can be effective by showing its optimality in a Neyman-Pearson setting. Our experimental studies using various detection tasks curated from benchmark datasets demonstrate AID’s ability to synergistically combine the unique capabilities of individual detectors. For example, it is observed that AID can improve the area under the curve (AUC) by an absolute value of 0.07 to 0.21, with a median of 0.12, compared to the best individual detectors developed for specific safety aspects. The improvement is particularly significant for complex detection tasks that mix different unsafe data sources.
Search
Fix author
Co-authors
- Ali Anwar 2
- Seyyed Saeid Cheshmi 1
- Enmao Diao 1
- Jie Ding 1
- Azal Ahmad Khan 1
- show all...