MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

Yongjin Yang; Haneul Yoo; Hwaran Lee

doi:10.18653/v1/2025.findings-naacl.325

MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

Abstract

Despite the massive advancements in large language models (LLMs), they still suffer from producing plausible but incorrect responses. To improve the reliability of LLMs, recent research has focused on uncertainty quantification to predict whether a response is correct or not. However, most uncertainty quantification methods have been evaluated on single-labeled questions, which removes data uncertainty—the irreducible randomness often present in user queries, which can arise from factors like multiple possible answers. This limitation may cause uncertainty quantification results to be unreliable in practical settings. In this paper, we investigate previous uncertainty quantification methods under the presence of data uncertainty. Our contributions are two-fold: 1) proposing a new Multi-Answer Question Answering dataset, **MAQA**, consisting of world knowledge, mathematical reasoning, and commonsense reasoning tasks to evaluate uncertainty quantification regarding data uncertainty, and 2) assessing 5 uncertainty quantification methods of diverse white- and black-box LLMs. Our findings show that previous methods relatively struggle compared to single-answer settings, though this varies depending on the task. Moreover, we observe that entropy- and consistency-based methods effectively estimate model uncertainty, even in the presence of data uncertainty.

Anthology ID:: 2025.findings-naacl.325
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5861–5878
Language:
URL:: https://aclanthology.org/2025.findings-naacl.325/
DOI:: 10.18653/v1/2025.findings-naacl.325
Bibkey:
Cite (ACL):: Yongjin Yang, Haneul Yoo, and Hwaran Lee. 2025. MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 5861–5878, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty (Yang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.325.pdf

PDF Cite Search Fix data