Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Mingyang Song; Mao Zheng; Xuan Luo

Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Abstract

Utilizing Large Language Models (LLMs) as evaluators to assess the performance of other LLMs has garnered attention. However, this evaluation approach is affected by potential biases within LLMs, raising concerns about the accuracy and reliability of the evaluation results of LLMs. To address this issue, we propose and explore two many-shot In-Context Learning (ICL) prompt templates to help LLM evaluators mitigate potential biases: Many-Shot with Reference (MSwR) and Many-Shot without Reference (MSoR). Specifically, the former utilizes in-context examples with model-generated rationales as references, while the latter does not include these references. Using these prompt designs, we investigate the impact of increasing the number of in-context examples on the consistency and quality of the evaluation results. Experimental results show that advanced LLMs, such as GPT-4, perform better in the many-shot regime than in the zero-shot regime. Furthermore, in most cases, MSwR performs significantly better than MSoR.

Anthology ID:: 2025.coling-main.548
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8232–8241
Language:
URL:: https://aclanthology.org/2025.coling-main.548/
DOI:
Bibkey:
Cite (ACL):: Mingyang Song, Mao Zheng, and Xuan Luo. 2025. Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8232–8241, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study (Song et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.548.pdf

PDF Cite Search Fix data