LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Bofei Gao; Zefan Cai; Runxin Xu; Peiyi Wang (王培懿); Ce Zheng; Runji Lin; Keming Lu; Dayiheng Liu; Chang Zhou; Wen Xiao; Tianyu Liu; Baobao Chang (常宝宝)

doi:10.18653/v1/2025.findings-acl.753

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Tianyu Liu, Baobao Chang

Abstract

In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedback as rationale labels, that is, the correctness of each step and the detailed explanations. In this paper, we propose Math-Minos, a natural language feedback-enhanced verifier by constructing automatically generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier in both verification and reinforcement learning and also significantly alleviates the data-demanding problems of the reward model with an over 700% data efficiency improvement.

Anthology ID:: 2025.findings-acl.753
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14588–14604
Language:
URL:: https://aclanthology.org/2025.findings-acl.753/
DOI:: 10.18653/v1/2025.findings-acl.753
Bibkey:
Cite (ACL):: Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Tianyu Liu, and Baobao Chang. 2025. LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14588–14604, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback (Gao et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.753.pdf

PDF Cite Search Fix data