ReasonerRank: Redefining Language Model Evaluation with Ground-Truth-Free Ranking Frameworks

Jiamu Zhang; Jiayi Yuan; Andrew Wen; Hoang Anh Duy Le; Yu-Neng Chuang; Soo-Hyun Choi; Rui Chen; Xia Hu

doi:10.18653/v1/2025.findings-acl.700

ReasonerRank: Redefining Language Model Evaluation with Ground-Truth-Free Ranking Frameworks

Jiamu Zhang, Jiayi Yuan, Andrew Wen, Hoang Anh Duy Le, Yu-Neng Chuang, Soo-Hyun Choi, Rui Chen, Xia Hu

Abstract

Large Language Models (LLMs) are increasingly adopted across real-world applications, yet traditional evaluations rely on expensive, domain-specific ground-truth labels that are often unavailable or infeasible. We introduce a ground-truth-free evaluation framework focused on reasoning consistency and instruction following, shifting the emphasis from correctness—which is elusive without labels—to transparent, coherent, evidence-based reasoning. Each model response must include a direct answer, a structured multi-step explanation, and supporting evidence, all assessed via semantic similarity and output adherence checks. We further propose TopK-ReRank, which refines rankings by constructing a consensus answer from the most reliable models, reducing ambiguity across diverse reasoning styles. Experiments show that our framework outperforms existing label-free methods, including majority voting, triplet ranking, and peer-review approaches, providing a more interpretable and efficient alternative for evaluating LLMs in the absence of ground-truth labels.

Anthology ID:: 2025.findings-acl.700
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13623–13639
Language:
URL:: https://aclanthology.org/2025.findings-acl.700/
DOI:: 10.18653/v1/2025.findings-acl.700
Bibkey:
Cite (ACL):: Jiamu Zhang, Jiayi Yuan, Andrew Wen, Hoang Anh Duy Le, Yu-Neng Chuang, Soo-Hyun Choi, Rui Chen, and Xia Hu. 2025. ReasonerRank: Redefining Language Model Evaluation with Ground-Truth-Free Ranking Frameworks. In Findings of the Association for Computational Linguistics: ACL 2025, pages 13623–13639, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: ReasonerRank: Redefining Language Model Evaluation with Ground-Truth-Free Ranking Frameworks (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.700.pdf

PDF Cite Search Fix data