Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers

Kabir Ahuja; Vidhisha Balachandran; Madhur Panwar; Tianxing He; Noah A. Smith; Navin Goyal; Yulia Tsvetkov

doi:10.1162/tacl_a_00733

Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers

Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, Yulia Tsvetkov

Abstract

Transformers trained on natural language data have been shown to exhibit hierarchical generalization without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such preference for hierarchical generalization. We extensively experiment with transformers trained on five synthetic, controlled datasets using several training objectives and show that, while objectives such as sequence-to-sequence modeling, classification, etc., often fail to lead to hierarchical generalization, the language modeling objective consistently leads to transformers generalizing hierarchically. We then study how different generalization behaviors emerge during the training by conducting pruning experiments that reveal the joint existence of subnetworks within the model implementing different generalizations. Finally, we take a Bayesian perspective to understand transformers’ preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and if the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization. Overall, our work presents new insights on the origins of hierarchical generalization in transformers and provides a theoretical framework for studying generalization in language models.

Anthology ID:: 2025.tacl-1.6
Volume:: Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:: 2025
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 121–141
Language:
URL:: https://aclanthology.org/2025.tacl-1.6/
DOI:: 10.1162/tacl_a_00733
Bibkey:
Cite (ACL):: Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, and Yulia Tsvetkov. 2025. Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers. Transactions of the Association for Computational Linguistics, 13:121–141.
Cite (Informal):: Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers (Ahuja et al., TACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.tacl-1.6.pdf

PDF Cite Search Fix data