ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework

Kai Qin; Liangxin Liu; Yu Liang; Longzheng Wang; Wangyan; Zhang Yueyang; Long Xia; Zhiyuan Sun; Houde Liu; Daiting Shi

ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework

Kai Qin, Liangxin Liu, Yu Liang, Longzheng Wang, Wangyan, Zhang Yueyang, Long Xia, Zhiyuan Sun, Houde Liu, Daiting Shi

Abstract

Reward Models (RMs) are critical components in the Reinforcement Learning from Human Feedback (RLHF) pipeline, directly determining the alignment quality of Large Language Models (LLMs). Recently, Generative Reward Models (GRMs) have emerged as a superior paradigm, offering higher interpretability and stronger generalization than traditional scalar RMs. However, existing methods for GRMs focus primarily on outcome-level supervision, neglecting analytical process quality, which constrains their potential. To address this, we propose ReflectRM, a novel GRM that leverages self-reflection to assess analytical quality and enhance preference modeling. ReflectRM is trained under a unified generative framework for joint modeling of response preference and analysis preference. During inference, we use its self-reflection capability to identify the most reliable analysis, from which the final preference prediction is derived. Experiments across four benchmarks show that ReflectRM consistently improves performance, achieving an average accuracy gain of +3.7 on Qwen3-4B. Further experiments confirm that response preference and analysis preference are mutually reinforcing. Notably, ReflectRM substantially mitigates positional bias, yielding +10.2 improvement compared with leading GRMs and establishing itself as a more stable evaluator. Our code is available at https://github.com/yuliangCarmelo/ReflectRM.

Anthology ID:: 2026.acl-long.1676
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36207–36223
Language:
URL:: https://aclanthology.org/2026.acl-long.1676/
DOI:
Bibkey:
Cite (ACL):: Kai Qin, Liangxin Liu, Yu Liang, Longzheng Wang, Wangyan, Zhang Yueyang, Long Xia, Zhiyuan Sun, Houde Liu, and Daiting Shi. 2026. ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 36207–36223, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework (Qin et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1676.pdf
Checklist:: 2026.acl-long.1676.checklist.pdf

PDF Cite Search Checklist Fix data