Evidence of Generative Syntax in LLMs

Mary Kennedy

doi:10.18653/v1/2025.conll-1.25

Evidence of Generative Syntax in LLMs

Abstract

The syntactic probing literature has been largely limited to shallow structures like dependency trees, which are unable to capture the subtle differences in sub-surface syntactic structures that yield semantic nuances. These structures are captured by theories of syntax like generative syntax, but have not been researched in the LLM literature due to the difficulties in probing these complex structures with many silent, covert nodes. Our work presents a method for overcoming this limitation by deploying Hewitt and Manning’s (2019) dependency-trained probe on sentence constructions whose structural representation is identical in a dependency parse, but differs in theoretical syntax. If a pretrained language model has captured the theoretical syntax structure, then the probe’s predicted distances should vary in syntactically-predicted ways. Using this methodology and a novel dataset, we find evidence that LLMs have captured syntactic structures far richer than previously realized, indicating LLMs are able to capture the nuanced meanings that result from sub-surface differences in structural form.

Anthology ID:: 2025.conll-1.25
Volume:: Proceedings of the 29th Conference on Computational Natural Language Learning
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Gemma Boleda, Michael Roth
Venues:: CoNLL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 377–396
Language:
URL:: https://aclanthology.org/2025.conll-1.25/
DOI:: 10.18653/v1/2025.conll-1.25
Bibkey:
Cite (ACL):: Mary Kennedy. 2025. Evidence of Generative Syntax in LLMs. In Proceedings of the 29th Conference on Computational Natural Language Learning, pages 377–396, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Evidence of Generative Syntax in LLMs (Kennedy, CoNLL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.conll-1.25.pdf

PDF Cite Search Fix data