A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Tianyu Liu; Yizhe Zhang; Chris Brockett; Yi Mao; Zhifang Sui; Weizhu Chen; William B. Dolan

doi:10.18653/v1/2022.acl-long.464

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, Bill Dolan

Abstract

Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications. Existing work usually attempts to detect these hallucinations based on a corresponding oracle reference at a sentence or document level. However ground-truth references may not be readily available for many free-form text generation applications, and sentence- or document-level detection may fail to provide the fine-grained signals that would prevent fallacious content in real time. As a first step to addressing these issues, we propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDeS (HAllucination DEtection dataSet). To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations. To mitigate label imbalance during annotation, we utilize an iterative model-in-loop strategy. We conduct comprehensive data analyses and create multiple baseline models.

Anthology ID:: 2022.acl-long.464
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6723–6737
Language:
URL:: https://aclanthology.org/2022.acl-long.464
DOI:: 10.18653/v1/2022.acl-long.464
Bibkey:
Cite (ACL):: Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, and Bill Dolan. 2022. A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6723–6737, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation (Liu et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.464.pdf
Software:: 2022.acl-long.464.software.zip
Code: microsoft/HaDes + additional community code
Data: HaDes

PDF Cite Search Code Software