MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding

Jingyuan Deng; Yujiu Yang

MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding

Abstract

Large vision-language models (LVLMs) have shown remarkable performance in visual-language understanding for downstream multimodal tasks. While their capabilities are improving, problems emerge simultaneously. Among those problems, the hallucinations have attracted much attention, which stands for the phenomenon where LVLMs generate contradictory content to their input visual and text contents. Many approaches have been proposed to deal with this issue, such as contrastive decoding and attention manipulation. However, contrastive decoding methods struggle in constructing appropriate contrastive samples, and attention manipulation methods are highly sensitive, lacking stability. In this work, we propose image head Masked Contrastive Decoding (MaskCD). Our approach utilizes the “image heads” in LVLMs, masking them to construct contrastive samples for contrastive decoding. We evaluated MaskCD on LLaVA-1.5-7b and Qwen-VL-7b, using various benchmarks such as CHAIR, POPE, AMBER and MME. The results demonstrate that MaskCD effectively alleviates the phenomenon of hallucinations and retains the general capabilities of LVLMs. Corresponding resources could be found at: https://github.com/Deng-Jingyuan/MaskCD.

Anthology ID:: 2025.findings-emnlp.1025
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18854–18866
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1025/
DOI:
Bibkey:
Cite (ACL):: Jingyuan Deng and Yujiu Yang. 2025. MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18854–18866, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding (Deng & Yang, Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1025.pdf
Checklist:: 2025.findings-emnlp.1025.checklist.pdf

PDF Cite Search Checklist Fix data