Towards Statistical Factuality Guarantee for Large Vision-Language Models

Zhuohang Li; Chao Yan; Nicholas J Jackson; Wendi Cui; Bo Li; Jiaxin Zhang; Bradley A. Malin

doi:10.18653/v1/2025.emnlp-main.576

Towards Statistical Factuality Guarantee for Large Vision-Language Models

Zhuohang Li, Chao Yan, Nicholas J Jackson, Wendi Cui, Bo Li, Jiaxin Zhang, Bradley A. Malin

Abstract

Advancements in Large Vision-Language Models (LVLMs) have demonstrated impressive performance in image-conditioned text generation; however, hallucinated outputs–text that misaligns with the visual input–pose a major barrier to their use in safety-critical applications. We introduce ConfLVLM, a conformal-prediction-based framework that achieves finite-sample distribution-free statistical guarantees to the factuality of LVLM output. Taking each generated detail as a hypothesis, ConfLVLM statistically tests factuality via efficient heuristic uncertainty measures to filter out unreliable claims. We conduct extensive experiments covering three representative application domains: general scene understanding, medical radiology report generation, and document understanding. Remarkably, ConfLVLM reduces the error rate of claims generated by LLaVa-1.5 for scene descriptions from 87.8% to 10.0% by filtering out erroneous claims with a 95.3% true positive rate. Our results further show that ConfLVLM is highly flexible, and can be applied to any black-box LVLMs paired with any uncertainty measure for any image-conditioned free-form text generation task while providing a rigorous guarantee on controlling hallucination risk.

Anthology ID:: 2025.emnlp-main.576
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11435–11456
Language:
URL:: https://aclanthology.org/2025.emnlp-main.576/
DOI:: 10.18653/v1/2025.emnlp-main.576
Bibkey:
Cite (ACL):: Zhuohang Li, Chao Yan, Nicholas J Jackson, Wendi Cui, Bo Li, Jiaxin Zhang, and Bradley A. Malin. 2025. Towards Statistical Factuality Guarantee for Large Vision-Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 11435–11456, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Towards Statistical Factuality Guarantee for Large Vision-Language Models (Li et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.576.pdf
Checklist:: 2025.emnlp-main.576.checklist.pdf

PDF Cite Search Checklist Fix data