Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads

Tianhe Zhang; Yonghong Deng; Ping Jian (鉴萍); Zhen Yang; Boyang Wang; Xinyue Zhang

Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads

Tianhe Zhang, Yonghong Deng, Ping Jian, Zhen Yang, Boyang Wang, Xinyue Zhang

Abstract

While LLMs demonstrate impressive reasoning capabilities, their internal decision dynamics remain opaque. To render these process interpretable and intervenable, we propose Dynamic Entropy Tracing, a mechanism-aware framework that interprets the evolving "choice state" of attention heads during CoT generation through stepwise head-wise option-logit and entropy tracing. Our analysis reveals distinct functional behaviors at attention heads: Steadfast Heads, characterized by consistently low entropy and producing a sharp, option-selective logit pattern with a stable top choice, and Wavering Heads, characterized by consistently high entropy and producing flat or oscillatory option logits without a persistent winner. Leveraging these traces, we identify a set of intervention targets and perform Selective Head Fine-Tuning, updating solely these selected heads against a frozen backbone. Experiments across the LLaMA and Qwen families reveal a striking plasticity hierarchy: fine-tuning just 30 Wavering Heads recovers over 98% of the performance achieved by full-parameter tuning, and in some settings modestly exceeds it. In contrast, intervening on Steadfast Heads yields much less gains. Our findings translate process-level mechanistic observables into a principled criterion for selective fine-tuning, offering a fundamental insight: the most effective tuning knobs are not the components that signal the final decision, but those that retain uncertainty, and thus plasticity, during its formation.

Anthology ID:: 2026.findings-acl.133
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2777–2795
Language:
URL:: https://aclanthology.org/2026.findings-acl.133/
DOI:
Bibkey:
Cite (ACL):: Tianhe Zhang, Yonghong Deng, Ping Jian, Zhen Yang, Boyang Wang, and Xinyue Zhang. 2026. Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2777–2795, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads (Zhang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.133.pdf
Checklist:: 2026.findings-acl.133.checklist.pdf

PDF Cite Search Checklist Fix data