Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs Xue-Yong Fu author Md Tahmid Rahman Laskar author Cheng Chen author Shashi Bhushan Tn author 2023-12 text Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) Sebastian Gehrmann editor Alex Wang editor João Sedoc editor Elizabeth Clark editor Kaustubh Dhole editor Khyathi Raghavi Chandu editor Enrico Santus editor Hooman Sedghamiz editor Association for Computational Linguistics Singapore conference publication fu-etal-2023-large https://aclanthology.org/2023.gem-1.25/ 2023-12 310 316