Two-tiered Encoder-based Hallucination Detection for Retrieval-Augmented Generation in the Wild

Ilana Zimmerman, Jadin Tredup, Ethan Selfridge, Joseph Bradley


Abstract
Detecting hallucinations, where Large Language Models (LLMs) are not factually consistent with a Knowledge Base (KB), is a challenge for Retrieval-Augmented Generation (RAG) systems. Current solutions rely on public datasets to develop prompts or fine-tune a Natural Language Inference (NLI) model. However, these approaches are not focused on developing an enterprise RAG system; they do not consider latency, train or evaluate on production data, nor do they handle non-verifiable statements such as small talk or questions. To address this, we leverage the customer service conversation data of four large brands to evaluate existing solutions and propose a set of small encoder models trained on a new dataset. We find the proposed models to outperform existing methods and highlight the value of combining a small amount of in-domain data with public datasets.
Anthology ID:
2024.emnlp-industry.2
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–22
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.2
DOI:
Bibkey:
Cite (ACL):
Ilana Zimmerman, Jadin Tredup, Ethan Selfridge, and Joseph Bradley. 2024. Two-tiered Encoder-based Hallucination Detection for Retrieval-Augmented Generation in the Wild. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 8–22, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Two-tiered Encoder-based Hallucination Detection for Retrieval-Augmented Generation in the Wild (Zimmerman et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.2.pdf