Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Qi Cao; Andrew Gambardella; Takeshi Kojima; Yutaka Matsuo; Yusuke Iwasawa

Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, their limited truthfulness and tendency toward overconfidence constrain their reliability in factual tasks. Uncertainty quantification offers a promising approach to identifying potentially unreliable outputs from LLMs. Yet most existing methods rely on repeated sampling or auxiliary models, which substantially increase computational overhead. To address these limitations, we propose an efficient uncertainty quantification method that leverages semantic information inherently encoded in LLMs. Specifically, we group tokens into semantically consistent clusters based on embedding clustering and prefix matching, and compute a cluster-based score at each decoding step to represent uncertainty. Our approach requires only a single generation and does not depend on any auxiliary models. Experiments on multiple datasets and models demonstrate that our method achieves performance comparable to existing baselines while substantially reducing computational overhead.

Anthology ID:: 2026.eacl-short.49
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 682–696
Language:
URL:: https://aclanthology.org/2026.eacl-short.49/
DOI:
Bibkey:
Cite (ACL):: Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, and Yusuke Iwasawa. 2026. Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 682–696, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models (Cao et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-short.49.pdf
Checklist:: 2026.eacl-short.49.checklist.pdf

PDF Cite Search Checklist Fix data