Likelihood-based Mitigation of Evaluation Bias in Large Language Models

Masanari Ohi; Masahiro Kaneko; Ryuto Koike; Mengsay Loem; Naoaki Okazaki

Likelihood-based Mitigation of Evaluation Bias in Large Language Models

Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki

Abstract

Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics.However, the likelihood, a measure of LLM’s plausibility for a sentence, can vary due to superficial differences in sentences, such as word order and sentence structure.It is therefore possible that there might be a likelihood bias if LLMs are used for evaluation: they might overrate sentences with higher likelihoods while underrating those with lower likelihoods.In this paper, we investigate the presence and impact of likelihood bias in LLM-based evaluators.We also propose a method to mitigate the likelihood bias.Our method utilizes highly biased instances as few-shot examples for in-context learning.Our experiments in evaluating the data-to-text and grammatical error correction tasks reveal that several LLMs we test display a likelihood bias.Furthermore, our proposed method successfully mitigates this bias, also improving evaluation performance (in terms of correlation of models with human scores) significantly.

Anthology ID:: 2024.findings-acl.193
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3237–3245
Language:
URL:: https://aclanthology.org/2024.findings-acl.193
DOI:
Bibkey:
Cite (ACL):: Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, and Naoaki Okazaki. 2024. Likelihood-based Mitigation of Evaluation Bias in Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 3237–3245, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: Likelihood-based Mitigation of Evaluation Bias in Large Language Models (Ohi et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.193.pdf

PDF Cite Search