Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision

Zihan Wang; Yunxuan Li; Yuexin Wu; Liangchen Luo; Le Hou; Hongkun Yu; Jingbo Shang

Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision

Zihan Wang, Yunxuan Li, Yuexin Wu, Liangchen Luo, Le Hou, Hongkun Yu, Jingbo Shang

Abstract

Process supervision, using a trained verifier to evaluate the intermediate steps generated by a reasoner, has demonstrated significant improvements in multi-step problem solving. In this paper, to avoid the expensive effort of human annotation on the verifier training data, we introduce Model-induced Process Supervision (MiPS), a novel method for automating data curation. MiPS annotates an intermediate step by sampling completions of this solution through the reasoning model, and obtaining an accuracy defined as the proportion of correct completions. Inaccuracies of the reasoner would cause MiPS underestimating the accuracy of intermediate steps, therefore, we suggest and empirically show that verification focusing on high predicted scores of the verifier shall be preferred over that of low predicted scores, contrary to prior observations on human curated data. Our approach significantly improves the performance of PaLM 2 on math and coding tasks (accuracy +0.67% on GSM8K, +4.16% on MATH, +0.92% on MBPP compared with an output supervision trained verifier). Additionally, our study demonstrates that the verifier exhibits strong generalization ability across different reasoning models.

Anthology ID:: 2024.findings-emnlp.429
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7309–7319
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.429
DOI:
Bibkey:
Cite (ACL):: Zihan Wang, Yunxuan Li, Yuexin Wu, Liangchen Luo, Le Hou, Hongkun Yu, and Jingbo Shang. 2024. Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7309–7319, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision (Wang et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.429.pdf

PDF Cite Search