What Does Alignment Cost? The Structural Brittleness of Chain-of-Thought Reasoning

Joanna Hao; Shanduojiao Jiang; Sai Asish Nakka

What Does Alignment Cost? The Structural Brittleness of Chain-of-Thought Reasoning

Joanna Hao, Shanduojiao Jiang, Sai Asish Nakka

Abstract

While Chain-of-Thought (CoT) prompting enables Large Language Models to explicitly justify their predictions, the extent to which these textual rationales faithfully reflect internal computation remains unclear. We investigate the circuit-level impact of alignment by performing a strict within-family comparison of the 1B-parameter Llama 3 architecture (Base vs. Instruct). Executing dynamic circuit discovery and dual-direction resample ablation on unconstrained CoT traces across synthetic mathematical primitives and a GSM8K proxy, we find that foundation models possess highly redundant, self-repairing computational networks; completely corrupting their primary reasoning circuits yields a minimal performance drop (2.92%) due to the dynamic compensation of backup heads (the Hydra Effect). In contrast, the instruction-tuned model exhibits reduced structural redundancy, suffering more than double the degradation (6.79%) under identical perturbation. We formalize our observation as an "Alignment Tax on Redundancy": optimizing for human-preference compliance repurposes dormant backup circuits, centralizing mathematical routing and rendering the aligned model’s reasoning pathways significantly more vulnerable to internal perturbation.

Anthology ID:: 2026.knowfm-1.3
Volume:: Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Canyu Chen, Yuji Zhang, Zoey Sha Li, Zihan Wang, Qineng Wang, Jinyan Su, Priyanka Kargupta, Sara Vera Marjanović, Jeff Z. Pan, Mohit Bansal, Isabelle Augenstein, Jiawei Han, Heng Ji, Manling Li
Venues:: KnowFM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25–33
Language:
URL:: https://aclanthology.org/2026.knowfm-1.3/
DOI:
Bibkey:
Cite (ACL):: Joanna Hao, Shanduojiao Jiang, and Sai Asish Nakka. 2026. What Does Alignment Cost? The Structural Brittleness of Chain-of-Thought Reasoning. In Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026), pages 25–33, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: What Does Alignment Cost? The Structural Brittleness of Chain-of-Thought Reasoning (Hao et al., KnowFM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.knowfm-1.3.pdf

PDF Cite Search Fix data