Reference-Free Summarization Evaluation with Large Language Models

Abbas Akkasi, Kathleen Fraser, Majid Komeili


Abstract
With the continuous advancement in unsupervised learning methodologies, text generation has become increasingly pervasive. However, the evaluation of the quality of the generated text remains challenging. Human annotations are expensive and often show high levels of disagreement, in particular for certain tasks characterized by inherent subjectivity, such as translation and summarization.Consequently, the demand for automated metrics that can reliably assess the quality of such generative systems and their outputs has grown more pronounced than ever. In 2023, Eval4NLP organized a shared task dedicated to the automatic evaluation of outputs from two specific categories of generative systems: machine translation and summarization. This evaluation was achieved through the utilization of prompts with Large Language Models. Participating in the summarization evaluation track, we propose an approach that involves prompting LLMs to evaluate six different latent dimensions of summarization quality. In contrast to many previous approaches to summarization assessments, which emphasize lexical overlap with reference text, this method surfaces the importance of correct syntax in summarization evaluation. Our method resulted in the second-highest performance in this shared task, demonstrating its effectiveness as a reference-free evaluation.
Anthology ID:
2023.eval4nlp-1.16
Volume:
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2023
Address:
Bali, Indonesia
Editors:
Daniel Deutsch, Rotem Dror, Steffen Eger, Yang Gao, Christoph Leiter, Juri Opitz, Andreas Rücklé
Venues:
Eval4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
193–201
Language:
URL:
https://aclanthology.org/2023.eval4nlp-1.16
DOI:
10.18653/v1/2023.eval4nlp-1.16
Bibkey:
Cite (ACL):
Abbas Akkasi, Kathleen Fraser, and Majid Komeili. 2023. Reference-Free Summarization Evaluation with Large Language Models. In Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems, pages 193–201, Bali, Indonesia. Association for Computational Linguistics.
Cite (Informal):
Reference-Free Summarization Evaluation with Large Language Models (Akkasi et al., Eval4NLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eval4nlp-1.16.pdf