Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Sreetama Sarkar; Yue Che; Alex Gavin; Peter Anthony Beerel; Souvik Kundu

doi:10.18653/v1/2025.emnlp-main.631

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression

Sreetama Sarkar, Yue Che, Alex Gavin, Peter Anthony Beerel, Souvik Kundu

Abstract

Despite their remarkable progress in multimodal understanding tasks, large vision language models (LVLMs) often suffer from “hallucination”, generating texts misaligned with the visual context. Existing methods aimed at reducing hallucinations through inference time intervention incur a significant increase in latency. To mitigate this, we present **SPIN**, a task-agnostic attention-guided head suppression strategy that can be seamlessly integrated during inference **without incurring any significant compute or latency overhead**. We investigate whether hallucination in LVLMs can be linked to specific model components. Our analysis suggests that hallucinations can be attributed to a dynamic subset of attention heads in each layer. Leveraging this insight, for each text query token, we selectively suppress attention heads that exhibit low attention to image tokens, keeping the top-k attention heads intact. Extensive evaluations on visual question answering and image description tasks demonstrate the efficacy of SPIN in reducing hallucination scores up to **2.7x** while maintaining F1, and improving throughput by **1.8x** compared to existing alternatives.

Anthology ID:: 2025.emnlp-main.631
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12481–12500
Language:
URL:: https://aclanthology.org/2025.emnlp-main.631/
DOI:: 10.18653/v1/2025.emnlp-main.631
Bibkey:
Cite (ACL):: Sreetama Sarkar, Yue Che, Alex Gavin, Peter Anthony Beerel, and Souvik Kundu. 2025. Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12481–12500, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression (Sarkar et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.631.pdf
Checklist:: 2025.emnlp-main.631.checklist.pdf

PDF Cite Search Checklist Fix data