Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning

Lin Zhang; Lijie Hu; Di Wang

doi:10.18653/v1/2025.findings-naacl.76

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning

Abstract

Transformer-based language models have achieved significant success; however, their internal mechanisms remain largely opaque due to the complexity of non-linear interactions and high-dimensional operations. While previous studies have demonstrated that these models implicitly embed reasoning trees, humans typically employ various distinct logical reasoning mechanisms to complete the same task. It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks. In this paper, we aim to address this question by investigating the mechanistic interpretability of language models, particularly in the context of multi-step reasoning tasks. Specifically, we employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process, allowing us to map the reasoning paths adopted by the model. We apply this methodology to the GPT-2 model on a prediction task (IOI) and demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.

Anthology ID:: 2025.findings-naacl.76
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1387–1404
Language:
URL:: https://aclanthology.org/2025.findings-naacl.76/
DOI:: 10.18653/v1/2025.findings-naacl.76
Bibkey:
Cite (ACL):: Lin Zhang, Lijie Hu, and Di Wang. 2025. Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1387–1404, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.76.pdf

PDF Cite Search Fix data