Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation

Chris Samarinas; Alexander Krubner; Alireza Salemi; Youngwoo Kim; Hamed Zamani

doi:10.18653/v1/2025.findings-acl.693

Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation

Chris Samarinas, Alexander Krubner, Alireza Salemi, Youngwoo Kim, Hamed Zamani

Abstract

This paper presents ICAT, an evaluation framework for measuring coverage of diverse factual information in long-form text generation. ICAT breaks down a long output text into a list of atomic claims and not only verifies each claim through retrieval from a (reliable) knowledge source, but also computes the alignment between the atomic factual claims and various aspects expected to be presented in the output. We study three implementations of the ICAT framework, each with a different assumption on the availability of aspects and alignment method. By adopting data from the diversification task in the TREC Web Track and the ClueWeb corpus, we evaluate the ICAT framework. We demonstrate strong correlation with human judgments and provide comprehensive evaluation across multiple state-of-the-art LLMs. Our framework further offers interpretable and fine-grained analysis of diversity and coverage. Its modular design allows for easy adaptation to different domains and datasets, making it a valuable tool for evaluating the qualitative aspects of long-form responses produced by LLMs.

Anthology ID:: 2025.findings-acl.693
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13468–13482
Language:
URL:: https://aclanthology.org/2025.findings-acl.693/
DOI:: 10.18653/v1/2025.findings-acl.693
Bibkey:
Cite (ACL):: Chris Samarinas, Alexander Krubner, Alireza Salemi, Youngwoo Kim, and Hamed Zamani. 2025. Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 13468–13482, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation (Samarinas et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.693.pdf

PDF Cite Search Fix data