The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods

Arpit Singh Gautam; Kailash Talreja; Saurabh Jha

The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods

Arpit Singh Gautam, Kailash Talreja, Saurabh Jha

Abstract

Large Language Models (LLMs) frequently "hallucinate" plausible but incorrect assertions, a vulnerability often missed by uncertainty metrics when models are "confidently wrong." We propose DiffuTruth, an unsupervised framework that re-conceptualizes fact verification via non-equilibrium thermodynamics, positing that factual truths act as stable attractors on a generative manifold while hallucinations are unstable. We introduce the "Generative Stress Test": claims are corrupted with noise and reconstructed using a discrete text diffusion model. We define Semantic Energy, a metric measuring the semantic divergence between the original claim and its reconstruction using an NLI critic. Unlike vector-space errors, Semantic Energy isolates deep factual contradictions. We further propose a Hybrid Calibration fusing this stability signal with discriminative confidence. Extensive experiments on FEVER demonstrate DiffuTruth achieves a state-of-the-art unsupervised AUROC of 0.725, outperforming baselines by +1.5% through the correction of overconfident predictions. Furthermore, we show superior zero-shot generalization on the multi-hop HOVER dataset, outperforming baselines by over 4%, confirming the robustness of thermodynamic truth properties to distribution shifts.

Anthology ID:: 2026.fever-1.4
Volume:: Proceedings of the Ninth Fact Extraction and VERification Workshop (FEVER)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Mubashara Akhtar, Rami Aly, Rui Cao, Christos Christodoulopoulos, Oana Cocarascu, Zhijiang Guo, Arpit Mittal, Michael Schlichtkrull, James Thorne, Andreas Vlachos
Venues:: FEVER | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 47–58
Language:
URL:: https://aclanthology.org/2026.fever-1.4/
DOI:
Bibkey:
Cite (ACL):: Arpit Singh Gautam, Kailash Talreja, and Saurabh Jha. 2026. The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods. In Proceedings of the Ninth Fact Extraction and VERification Workshop (FEVER), pages 47–58, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: The Energy of Falsehood: Detecting Hallucinations via Diffusion Model Likelihoods (Gautam et al., FEVER 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.fever-1.4.pdf

PDF Cite Search Fix data