ACUEval: Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization

David Wan, Koustuv Sinha, Srini Iyer, Asli Celikyilmaz, Mohit Bansal, Ramakanth Pasunuru


Abstract
The impressive generation capabilities of large language models (LLMs) have made it harder to detect the subtle hallucinations they make in abstractive summarization, where generated summaries consist of a blend of correct and incorrect information w.r.t. a given document. Recently-proposed LLM-based evaluation metrics attempt to capture this, but still face challenges: (1) they are biased towards summaries generated from the same underlying LLM, and (2) they lack interpretability, offering only a single score. In this work, we present ACUEval, a metric that leverages the power of LLMs to perform two sub-tasks: decomposing summaries into atomic content units (ACUs), and validating them against the source document. Compared to current strong LLM-based metrics, our two-step evaluation strategy improves correlation with human judgments of faithfulness on three summarization evaluation benchmarks by 3% in balanced accuracy compared to the next-best metric, and also shows reduced preference bias towards LLM-generated summary. Further, we show that errors detected by ACUEval can be used to generate actionable feedback for refining the summary, improving the faithfulness scores by more than 10%.
Anthology ID:
2024.findings-acl.597
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10036–10056
Language:
URL:
https://aclanthology.org/2024.findings-acl.597
DOI:
Bibkey:
Cite (ACL):
David Wan, Koustuv Sinha, Srini Iyer, Asli Celikyilmaz, Mohit Bansal, and Ramakanth Pasunuru. 2024. ACUEval: Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization. In Findings of the Association for Computational Linguistics ACL 2024, pages 10036–10056, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
ACUEval: Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization (Wan et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.597.pdf