MHALO: Evaluating MLLMs as Fine-grained Hallucination Detectors

Yishuo Cai; Renjie Gu; Jiaxu Li; Xuancheng Huang; Junzhe Chen; Xiaotao Gu; Minlie Huang

doi:10.18653/v1/2025.findings-acl.478

MHALO: Evaluating MLLMs as Fine-grained Hallucination Detectors

Yishuo Cai, Renjie Gu, Jiaxu Li, Xuancheng Huang, Junzhe Chen, Xiaotao Gu, Minlie Huang

Abstract

Hallucination remains a critical challenge for multimodal large language models (MLLMs), undermining their reliability in real-world applications. While fine-grained hallucination detection (FHD) holds promise for enhancing high-quality vision-language data construction and model alignment through enriched feedback signals, automated solutions for this task have yet to be systematically explored. Inspired by the concept of “MLLM as a Judge”, we introduce MHALO, the first comprehensive benchmark specifically designed for evaluating MLLMs’ capability in performing token-level FHD. Our benchmark encompasses 12 distinct hallucination types spanning both multimodal perception and reasoning domains. Through extensive evaluations of 9 selected MLLMs, we reveal substantial performance limitations, with the leading model achieving an average F1_IoU of only 40.59%. To address this limitation, we develop HaloDet-4B, a specialized model trained on our curated training data, which significantly outperforms existing models. We hope the benchmark can provide valuable insights for future research on hallucination mitigation in MLLMs. The code and dataset will be publicly available.

Anthology ID:: 2025.findings-acl.478
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9197–9222
Language:
URL:: https://aclanthology.org/2025.findings-acl.478/
DOI:: 10.18653/v1/2025.findings-acl.478
Bibkey:
Cite (ACL):: Yishuo Cai, Renjie Gu, Jiaxu Li, Xuancheng Huang, Junzhe Chen, Xiaotao Gu, and Minlie Huang. 2025. MHALO: Evaluating MLLMs as Fine-grained Hallucination Detectors. In Findings of the Association for Computational Linguistics: ACL 2025, pages 9197–9222, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: MHALO: Evaluating MLLMs as Fine-grained Hallucination Detectors (Cai et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.478.pdf

PDF Cite Search Fix data