FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning

Kankan Zhou; Eason Lai; Kyriakos Mouratidis; Jing Jiang

doi:10.18653/v1/2025.acl-long.1337

FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning

Kankan Zhou, Eason Lai, Kyriakos Mouratidis, Jing Jiang

Abstract

Humans possess a remarkable ability to interpret underspecified ambiguous statements by inferring their meanings from contexts such as visual inputs. This ability, however, may not be as developed in recent pre-trained vision-language models (VLMs). In this paper, we introduce a novel probing dataset called FOCUS to evaluate whether state-of-the-art VLMs have this ability. FOCUS consists of underspecified sentences paired with image contexts and carefully designed probing questions. Our experiments reveal that VLMs still fall short in handling underspecification even when visual inputs that can help resolve the ambiguities are available. To further support research in underspecification, FOCUS will be released for public use. We hope this dataset will inspire further research on the reasoning and contextual understanding capabilities of VLMs.

Anthology ID:: 2025.acl-long.1337
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27565–27584
Language:
URL:: https://aclanthology.org/2025.acl-long.1337/
DOI:: 10.18653/v1/2025.acl-long.1337
Bibkey:
Cite (ACL):: Kankan Zhou, Eason Lai, Kyriakos Mouratidis, and Jing Jiang. 2025. FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27565–27584, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning (Zhou et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1337.pdf

PDF Cite Search Fix data