Xiaomeng Hu


2024

pdf bib
Embedding and Gradient Say Wrong: A White-Box Method for Hallucination Detection
Xiaomeng Hu | Yiming Zhang | Ru Peng | Haozhe Zhang | Chenwei Wu | Gang Chen | Junbo Zhao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In recent years, large language models (LLMs) have achieved remarkable success in the field of natural language generation. Compared to previous small-scale models, they are capable of generating fluent output based on the provided prefix or prompt. However, one critical challenge — the *hallucination* problem — remains to be resolved. Generally, the community refers to the undetected hallucination scenario where the LLMs generate text unrelated to the input text or facts. In this study, we intend to model the distributional distance between the regular conditional output and the unconditional output, which is generated without a given input text. Based upon Taylor Expansion for this distance at the output probability space, our approach manages to leverage the embedding and first-order gradient information. The resulting approach is plug-and-play that can be easily adapted to any autoregressive LLM. On the hallucination benchmarks HADES and other datasets, our approach achieves state-of-the-art performance.