Categorial grammar induction from raw data

Christian Clark; William Schuler

doi:10.18653/v1/2023.findings-acl.149

Categorial grammar induction from raw data

Abstract

Grammar induction, the task of learning a set of grammatical rules from raw or minimally labeled text data, can provide clues about what kinds of syntactic structures are learnable without prior knowledge. Recent work (e.g., Kim et al., 2019; Zhu et al., 2020; Jin et al., 2021a) has achieved advances in unsupervised induction of probabilistic context-free grammars (PCFGs). However, categorial grammar induction has received less recent attention, despite allowing inducers to support a larger set of syntactic categories—due to restrictions on how categories can combine—and providing a transparent interface with compositional semantics, opening up possibilities for models that jointly learn form and meaning. Motivated by this, we propose a new model for inducing a basic (Ajdukiewicz, 1935; Bar-Hillel, 1953) categorial grammar. In contrast to earlier categorial grammar induction systems (e.g., Bisk and Hockenmaier, 2012), our model learns from raw data without any part-of-speech information. Experiments on child-directed speech show that our model attains a recall-homogeneity of 0.33 on average, which dramatically increases to 0.59 when a bias toward forward function application is added to the model.

Anthology ID:: 2023.findings-acl.149
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2368–2379
Language:
URL:: https://aclanthology.org/2023.findings-acl.149
DOI:: 10.18653/v1/2023.findings-acl.149
Bibkey:
Cite (ACL):: Christian Clark and William Schuler. 2023. Categorial grammar induction from raw data. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2368–2379, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Categorial grammar induction from raw data (Clark & Schuler, Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.149.pdf
Video:: https://aclanthology.org/2023.findings-acl.149.mp4

PDF Cite Search Video