Sean Du
2026
VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation
Seongheon Park | Changdae Oh | Hyeong Kyu Choi | Sean Du | Sharon Li
Findings of the Association for Computational Linguistics: ACL 2026
Seongheon Park | Changdae Oh | Hyeong Kyu Choi | Sean Du | Sharon Li
Findings of the Association for Computational Linguistics: ACL 2026
Large Vision-Language Models (LVLMs) frequently hallucinate, limiting their safe deployment in real-world applications. Existing LLM self-evaluation methods rely on a model’s ability to estimate the correctness of its own outputs, which can improve deployment reliability; however, they depend heavily on language priors and are therefore ill-suited for evaluating vision-conditioned predictions. We propose VAUQ, a vision-aware uncertainty quantification framework for LVLM self-evaluation that explicitly measures how strongly a model’s output depends on visual evidence. VAUQ introduces the Image-Information Score (IS), which captures the reduction in predictive uncertainty attributable to visual input, and an unsupervised core-region masking strategy that amplifies the influence of salient regions. Combining predictive entropy with this core-masked IS yields a training-free scoring function that reliably reflects answer correctness. Comprehensive experiments show that VAUQ consistently outperforms existing self-evaluation methods across multiple datasets.
Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities
Changdae Oh | Seongheon Park | To Eun Kim | Jiatong Li | Wendi Li | Samuel Yeh | Sean Du | Hamed Hassani | Paul Bogdan | Dawn Song | Sharon Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Changdae Oh | Seongheon Park | To Eun Kim | Jiatong Li | Wendi Li | Samuel Yeh | Sean Du | Hamed Hassani | Paul Bogdan | Dawn Song | Sharon Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Uncertainty quantification (UQ) for large language models (LLMs) is a key building block for safety guardrails of daily LLM applications. Yet, even as LLM agents are increasingly deployed in highly complex tasks, most UQ research still centers on single-turn question-answering. We argue that UQ research must shift to realistic settings with interactive agents, and that a new principled framework for agent UQ is needed. This paper presents three pillars to build a solid ground for future agent UQ research: (1. Foundations) We present the first general formulation of agent UQ that subsumes broad classes of existing UQ setups; (2. Challenges) We identify four technical challenges specifically tied to agentic setups—selection of uncertainty estimator, uncertainty of heterogeneous entities, modeling uncertainty dynamics in interactive systems, and lack of fine-grained benchmarks—with numerical analysis on a real-world agent benchmark, 𝜏2-bench; (3. Future Directions) We conclude with noting on the practical implications of agent UQ and remaining open problems as forward-looking discussion for future explorations.