Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Lifeng Jin, Byung-Doh Oh, William Schuler


Abstract
Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-of-the-art results on many languages, further supporting a distributional model of syntactic acquisition.
Anthology ID:
2021.findings-emnlp.371
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4367–4378
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.371
DOI:
10.18653/v1/2021.findings-emnlp.371
Bibkey:
Cite (ACL):
Lifeng Jin, Byung-Doh Oh, and William Schuler. 2021. Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4367–4378, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages (Jin et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.371.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.371.mp4