A Systematic Analysis of Base Model Choice for Reward Modeling

Kian Ahrabian; Pegah Jandaghi; Negar Mokhberian; Sai Praneeth Karimireddy; Jay Pujara

A Systematic Analysis of Base Model Choice for Reward Modeling

Kian Ahrabian, Pegah Jandaghi, Negar Mokhberian, Sai Praneeth Karimireddy, Jay Pujara

Abstract

Reinforcement learning from human feedback (RLHF) and, at its core, reward modeling have become a crucial part of training powerful large language models (LLMs). One commonly overlooked factor in training high-quality reward models (RMs) is the effect of the base model, which is becoming more challenging to choose given the rapidly growing pool of LLMs. In this work, we present a systematic analysis of the effect of base model selection on reward modeling performance. Our results show that the performance can be improved by up to 14% compared to the most common (i.e., default) choice. Moreover, we showcase the strong statistical relation between some existing benchmarks and downstream performances. We also demonstrate that the results from a small set of benchmarks could be combined to boost the model selection (+18% on average in the top 5-10). Lastly, we illustrate the impact of different post-training steps on the final performance and explore using estimated data distributions to reduce performance prediction error.

Anthology ID:: 2025.emnlp-main.8
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 146–164
Language:
URL:: https://aclanthology.org/2025.emnlp-main.8/
DOI:
Bibkey:
Cite (ACL):: Kian Ahrabian, Pegah Jandaghi, Negar Mokhberian, Sai Praneeth Karimireddy, and Jay Pujara. 2025. A Systematic Analysis of Base Model Choice for Reward Modeling. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 146–164, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: A Systematic Analysis of Base Model Choice for Reward Modeling (Ahrabian et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.8.pdf
Checklist:: 2025.emnlp-main.8.checklist.pdf

PDF Cite Search Checklist Fix data