Leveraging NTPs for Efficient Hallucination Detection in VLMs

Ofir Azachi; Kfir Eliyahu; Eyal El Ani; Rom Himelstein; Roi Reichart; Yuval Pinter; Nitay Calderon

Leveraging NTPs for Efficient Hallucination Detection in VLMs

Ofir Azachi, Kfir Eliyahu, Eyal El Ani, Rom Himelstein, Roi Reichart, Yuval Pinter, Nitay Calderon

Abstract

Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM’s next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs. All data is publicly available at https://huggingface.co/datasets/wrom/Language-Vision-Hallucinations.

Anthology ID:: 2025.chomps-main.4
Volume:: Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Aman Sinha, Raúl Vázquez, Timothee Mickus, Rohit Agarwal, Ioana Buhnila, Patrícia Schmidtová, Federica Gamba, Dilip K. Prasad, Jörg Tiedemann
Venues:: CHOMPS | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35–48
Language:
URL:: https://aclanthology.org/2025.chomps-main.4/
DOI:
Bibkey:
Cite (ACL):: Ofir Azachi, Kfir Eliyahu, Eyal El Ani, Rom Himelstein, Roi Reichart, Yuval Pinter, and Nitay Calderon. 2025. Leveraging NTPs for Efficient Hallucination Detection in VLMs. In Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025), pages 35–48, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Leveraging NTPs for Efficient Hallucination Detection in VLMs (Azachi et al., CHOMPS 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.chomps-main.4.pdf

PDF Cite Search Fix data