Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Jiarui Wu; Zhuo Liu; Hangfeng He

doi:10.18653/v1/2025.findings-naacl.192

Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

Abstract

Spatial relation hallucinations pose a persistent challenge in large vision-language models (LVLMs), leading to generate incorrect predictions about object positions and spatial configurations within an image. To address this issue, we propose a constraint-aware prompting framework designed to reduce spatial relation hallucinations. Specifically, we introduce two types of constraints: (1) bidirectional constraint, which ensures consistency in pairwise object relations, and (2) transitivity constraint, which enforces relational dependence across multiple objects. By incorporating these constraints, LVLMs can produce more spatially coherent and consistent outputs. We evaluate our method on three widely-used spatial relation datasets, demonstrating performance improvements over existing approaches. Additionally, a systematic analysis of various bidirectional relation analysis choices and transitivity reference selections highlights greater possibilities of our methods in incorporating constraints to mitigate spatial relation hallucinations.

Anthology ID:: 2025.findings-naacl.192
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3450–3468
Language:
URL:: https://aclanthology.org/2025.findings-naacl.192/
DOI:: 10.18653/v1/2025.findings-naacl.192
Bibkey:
Cite (ACL):: Jiarui Wu, Zhuo Liu, and Hangfeng He. 2025. Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3450–3468, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting (Wu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.192.pdf

PDF Cite Search Fix data