Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

Subramanyam Sahoo; Vinija Jain; Aman Chadha; Divya Chaudhary

Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States

Subramanyam Sahoo, Vinija Jain, Aman Chadha, Divya Chaudhary

Abstract

Linear probing of large language model (LLM) hidden states is widely used to claim that models learn distinct representations for different reasoning types. We test this by probing Qwen3-14B on three benchmarks spanning the classical trichotomy: LogiQA 2.0 (deductive), ARC-Challenge (inductive), and 𝛼NLI (abductive). At layer 32 of 40, linear probes achieve 100% cross-validated accuracy with well-separated geometry (intrinsic dimensionalities: 20.6, 28.5, 33.6; convex hull contamination ≤1.5%). However, this separation is entirely driven by format confounds. Residualizing source identity, option count, and response length reduces accuracy to chance. Trace-anchor similarity indicates largely shared reasoning across tasks (42.5% agreement vs. 33.3% chance), and causal steering with random controls (n=20) shows no functional link between geometry and reasoning mode (p=0.286). Thus, high probe accuracy reflects task format rather than computational structure, motivating routine format deconfounding in mechanistic interpretability.

Anthology ID:: 2026.trustnlp-main.12
Volume:: Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Kai-Wei Chang, Ninareh Mehrabi, Satyapriya Krishna, Anubrata Das, Jwala Dhamala, Yang Trista Cao, Tharindu Kumarage, Anil Ramakrishna, Christos Christodoulopoulos, Yixin Wan, Aram Galystan, Anoop Kumar, Rahul Gupta
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 227–239
Language:
URL:: https://aclanthology.org/2026.trustnlp-main.12/
DOI:
Bibkey:
Cite (ACL):: Subramanyam Sahoo, Vinija Jain, Aman Chadha, and Divya Chaudhary. 2026. Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States. In Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026), pages 227–239, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: Linear Probes Detect Task Format, Not Reasoning Mode in Language Model Hidden States (Sahoo et al., TrustNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.trustnlp-main.12.pdf

PDF Cite Search Fix data