Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output

Hithesh Sankararaman; Mohammed Nasheed Yasin; Tanner Sorensen; Alessandro Di Bari; Andreas Stolcke

Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output

Hithesh Sankararaman, Mohammed Nasheed Yasin, Tanner Sorensen, Alessandro Di Bari, Andreas Stolcke

Abstract

We present a light-weight approach for detecting nonfactual outputs from retrieval-augemented generation (RAG). Given a context and putative output, we compute a factuality score that can be thresholded to yield a binary decision to check the results of LLM-based question-answering, summarization, or other systems. Unlike factuality checkers that themselves rely on LLMs, we use compact, open-source natural language inference (NLI) models that yield a freely accessible solution with low latency and low cost at run-time, and no need for LLM fine-tuning. The approach also enables downstream mitigation and correction of hallucinations, by tracing them back to specific context chunks. Our experiments show high ROC-AUC across a wide range of relevant open source datasets, indicating the effectiveness of our method for fact-checking RAG output.

Anthology ID:: 2024.emnlp-industry.97
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1305–1313
Language:
URL:: https://aclanthology.org/2024.emnlp-industry.97
DOI:
Bibkey:
Cite (ACL):: Hithesh Sankararaman, Mohammed Nasheed Yasin, Tanner Sorensen, Alessandro Di Bari, and Andreas Stolcke. 2024. Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1305–1313, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output (Sankararaman et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-industry.97.pdf

PDF Cite Search