Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, ZHeng ZHang, Richang Hong


Abstract
Multimodal large language models (MLLMs) combine visual and textual data for tasks like image captioning and visual question answering. Proper uncertainty calibration is crucial but challenging for reliable use in areas like healthcare and autonomous driving. This paper investigates several MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning as well as before and after multimodal training of the base LLMs. We observed miscalibration in their performance, and at the same time, no significant differences in calibration across these scenarios. We also highlight differences in uncertainty between text and the impact of the integration of these two types of information in uncertainty. To better understand MLLMs’ miscalibration and their ability to self-assess uncertainty, we developed the IDK (I don’t know) dataset, which is key for evaluating how they handle unknowns. Our findings reveal that MLLMs tend to give answers rather than admit uncertainty, but this self-assessment improves with prompt adjustments. Finally, to calibrate MLLMs and enhance model reliability, we propose techniques such as temperature scaling and iterative prompt optimization. Our results provide insights into improving MLLMs for effective and responsible deployment in multimodal applications.
Anthology ID:
2025.coling-main.208
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3095–3109
Language:
URL:
https://aclanthology.org/2025.coling-main.208/
DOI:
Bibkey:
Cite (ACL):
Zijun Chen, Wenbo Hu, Guande He, Zhijie Deng, ZHeng ZHang, and Richang Hong. 2025. Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models. In Proceedings of the 31st International Conference on Computational Linguistics, pages 3095–3109, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models (Chen et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.208.pdf