Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

Jiachen Zhu; Congmin Zheng; Jianghao Lin; Kounianhua Du; Ying Wen; Yong Yu; Jun Wang; Weinan Zhang

doi:10.18653/v1/2025.findings-acl.444

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

Jiachen Zhu, Congmin Zheng, Jianghao Lin, Kounianhua Du, Ying Wen, Yong Yu, Jun Wang, Weinan Zhang

Abstract

While large language models (LLMs) have significantly advanced mathematical reasoning, Process Reward Models (PRMs) have been developed to evaluate the logical validity of reasoning steps. However, PRMs still struggle with out-of-distribution (OOD) challenges. This paper identifies the OOD issues including step OOD, arising from differences in reasoning patterns across model types and sizes, and question OOD, due to dataset shifts between training and real-world problems. To address these issues, we introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle these OOD issues. By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps for PRM as a warmup to stimulate its potential to judge target steps, improving generalization and reasoning consistency across different models and problem types. Our extensive experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets. Our open-source contributions include a retrieval-enhanced dataset, a tuning framework for PRM training, and the RetreivalPRM model, establishing a new standard for PRM performance.

Anthology ID:: 2025.findings-acl.444
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8453–8468
Language:
URL:: https://aclanthology.org/2025.findings-acl.444/
DOI:: 10.18653/v1/2025.findings-acl.444
Bibkey:
Cite (ACL):: Jiachen Zhu, Congmin Zheng, Jianghao Lin, Kounianhua Du, Ying Wen, Yong Yu, Jun Wang, and Weinan Zhang. 2025. Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8453–8468, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning (Zhu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.444.pdf

PDF Cite Search Fix data