Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

Youngji Roh; Hyunjin Cho; Jaehyung Kim

Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

Abstract

Large Language Models (LLMs) exhibit highly anisotropic internal representations, often characterized by massive activations, a phenomenon where a small subset of feature dimensions possesses magnitudes significantly larger than the rest. While prior works view these extreme dimensions primarily as artifacts to be managed, we propose a distinct perspective: these dimensions serve as intrinsic interpretable functional units arising from domain specialization. Specifically, we propose a simple magnitude-based criterion to identify Domain-Critical Dimensions in a training-free manner. Our analyses reveal that such dimensions behave as interpretable semantic detectors for symbolic/quantitative patterns or domain-specific terms. In addition, we introduce Critical Dimension Steering, which applies activation steering exclusively to the identified dimensions. Empirical results show that this approach outperforms conventional whole-dimension steering in domain adaptation and jailbreaking scenarios.

Anthology ID:: 2026.acl-long.1380
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29930–29956
Language:
URL:: https://aclanthology.org/2026.acl-long.1380/
DOI:
Bibkey:
Cite (ACL):: Youngji Roh, Hyunjin Cho, and Jaehyung Kim. 2026. Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29930–29956, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models (Roh et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1380.pdf
Checklist:: 2026.acl-long.1380.checklist.pdf

PDF Cite Search Checklist Fix data