MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

xu Zhao Pan; Pengfei Zhou; Jiaxin Ai; Wangbo Zhao; Kai Wang; Xiaojiang Peng; Wenqi Shao; Hongxun Yao; Kaipeng Zhang

doi:10.18653/v1/2025.findings-acl.1112

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

xu Zhao Pan, Pengfei Zhou, Jiaxin Ai, Wangbo Zhao, Kai Wang, Xiaojiang Peng, Wenqi Shao, Hongxun Yao, Kaipeng Zhang

Abstract

Reasoning is an essential capacity for large language models (LLMs) to address complex tasks, whereas the identification of process errors is vital for improving this ability. Recently, process-level reward models (PRMs) were proposed to provide step-wise rewards that facilitate reinforcement learning and data production during training and guide LLMs toward correct steps during inference, thereby improving reasoning accuracy. However, existing benchmarks of PRMs are text-based and focus on error detection, neglecting other scenarios like reasoning search. To address this gap, we introduce MPBench, a comprehensive, multi-task, multimodal benchmark designed to systematically assess the effectiveness of PRMs in diverse scenarios. MPBench employs three evaluation paradigms, each targeting a specific role of PRMs in the reasoning process: (1) Step Correctness, which assesses the correctness of each intermediate reasoning step; (2) Answers Aggregation, which aggregates multiple solutions and selects the best one; and (3) Reasoning Process Search, which guides the search for optimal reasoning steps during inference. Through these paradigms, MPBench makes comprehensive evaluations and provides insights into the development of multimodal PRMs.

Anthology ID:: 2025.findings-acl.1112
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21586–21606
Language:
URL:: https://aclanthology.org/2025.findings-acl.1112/
DOI:: 10.18653/v1/2025.findings-acl.1112
Bibkey:
Cite (ACL):: xu Zhao Pan, Pengfei Zhou, Jiaxin Ai, Wangbo Zhao, Kai Wang, Xiaojiang Peng, Wenqi Shao, Hongxun Yao, and Kaipeng Zhang. 2025. MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification. In Findings of the Association for Computational Linguistics: ACL 2025, pages 21586–21606, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification (Pan et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1112.pdf

PDF Cite Search Fix data