A Probabilistic Generative Model of Linguistic Typology

Johannes Bjerva; Yova Kementchedjhieva; Ryan Cotterell; Isabelle Augenstein

doi:10.18653/v1/N19-1156

A Probabilistic Generative Model of Linguistic Typology

Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, Isabelle Augenstein

Abstract

In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquiry—we develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features. Furthermore, we show that language embeddings pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e. that there are significant correlations between typological features and languages.

Anthology ID:: N19-1156
Volume:: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Jill Burstein, Christy Doran, Thamar Solorio
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1529–1540
Language:
URL:: https://aclanthology.org/N19-1156/
DOI:: 10.18653/v1/N19-1156
Bibkey:
Cite (ACL):: Johannes Bjerva, Yova Kementchedjhieva, Ryan Cotterell, and Isabelle Augenstein. 2019. A Probabilistic Generative Model of Linguistic Typology. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1529–1540, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: A Probabilistic Generative Model of Linguistic Typology (Bjerva et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/N19-1156.pdf
Presentation:: N19-1156.Presentation.pptx

PDF Cite Search Presentation Fix data