MR. Judge: Multimodal Reasoner as a Judge

Renjie Pi; Haoping Bai; Qibin Chen; Xiaoming Simon Wang; Jiulong Shan; Xiaojiang Liu; Meng Cao

doi:10.18653/v1/2025.emnlp-main.1021

MR. Judge: Multimodal Reasoner as a Judge

Renjie Pi, Haoping Bai, Qibin Chen, Xiaoming Simon Wang, Jiulong Shan, Xiaojiang Liu, Meng Cao

Abstract

The paradigm of using Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) as evaluative judges has emerged as an effective approach in RLHF and inference-time scaling. In this work, we propose Multimodal Reasoner as a Judge (MR. Judge), a paradigm for empowering general-purpose MLLMs judges with strong reasoning capabilities. Instead of directly assigning scores for each response, we formulate the judgement process as a reasoning-inspired multiple-choice problem. Specifically, the judge model first conducts deliberate reasoning covering different aspects of the responses and eventually selects the best response from them. This reasoning process not only improves the interpretibility of the judgement, but also greatly enhances the performance of MLLM judges. To cope with the lack of questions with scored responses, we propose the following strategy to achieve automatic annotation: 1) Reverse Response Candidates Synthesis: starting from a supervised fine-tuning (SFT) dataset, we treat the original response as the best candidate and prompt the MLLM to generate plausible but flawed negative candidates. 2) Text-based reasoning distillation: we carefully design a data synthesis pipeline for distilling the reasoning capability from a text-based reasoning model, which is adopted to enable the MLLM judges to regain complex reasoning ability via warm up supervised fine-tuning. Experiments demonstrate that our MR. Judge is effective across a wide range of tasks. Specifically, our MR. Judge-7B surpasses GPT-4o by 9.9% on VL-RewardBench, and improves performance on MM-Vet during inference-time scaling by up to 7.7%.

Anthology ID:: 2025.emnlp-main.1021
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20181–20205
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1021/
DOI:: 10.18653/v1/2025.emnlp-main.1021
Bibkey:
Cite (ACL):: Renjie Pi, Haoping Bai, Qibin Chen, Xiaoming Simon Wang, Jiulong Shan, Xiaojiang Liu, and Meng Cao. 2025. MR. Judge: Multimodal Reasoner as a Judge. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20181–20205, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MR. Judge: Multimodal Reasoner as a Judge (Pi et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1021.pdf
Checklist:: 2025.emnlp-main.1021.checklist.pdf

PDF Cite Search Checklist Fix data