Amrit Singh Bedi
2025
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems
Aakriti Agrawal
|
Rohith Aralikatti
|
Anirudh Satheesh
|
Souradip Chakraborty
|
Amrit Singh Bedi
|
Furong Huang
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) have demonstrated exceptional capabilities, yet selecting the most reliable response from multiple LLMs remains a challenge, particularly in resource-constrained settings. Existing approaches often depend on costly external verifiers, human evaluators, or self-consistency techniques that require multiple samples from a single model. While multi-LLM systems produce more diverse responses than single models and thus have greater potential, they often underperform compared to single LLM self-consistency. In this work, we propose a calibrated log-likelihood-based selection framework to improve multi-LLM performance. Our approach leverages uncertainty estimation to identify the most confident response while minimizing inference costs. We show that our method outperforms majority voting and exceeds self-consistency performance when using a large number of model calls. Through extensive experiments, we demonstrate improvements of approx. 4%, 3%, and 5% on GSM8K, MMLU, and ARC, respectively, when applying uncertainty-aware selection to multi-LLM systems.