SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models

Carter Teplica; Yixin Liu; Arman Cohan; Tim G. J. Rudner

doi:10.18653/v1/2025.naacl-long.618

SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models

Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner

Abstract

We investigate the mechanistic sources of uncertainty in large language models (LLMs), an area with important implications for language model reliability and trustworthiness. To do so, we conduct a series of experiments designed to identify whether the factuality of generated responses and a model’s uncertainty originate in separate or shared circuits in the model architecture. We approach this question by adapting the well-established mechanistic interpretability techniques of causal tracing and zero-ablation to study the effect of different circuits on LLM generations. Our experiments on eight different models and five datasets, representing tasks predominantly requiring factual recall, provide strong evidence that a model’s uncertainty is produced in the same parts of the network that are responsible for the factuality of generated responses.

Anthology ID:: 2025.naacl-long.618
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12451–12469
Language:
URL:: https://aclanthology.org/2025.naacl-long.618/
DOI:: 10.18653/v1/2025.naacl-long.618
Bibkey:
Cite (ACL):: Carter Teplica, Yixin Liu, Arman Cohan, and Tim G. J. Rudner. 2025. SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 12451–12469, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models (Teplica et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.618.pdf

PDF Cite Search Fix data