Diachrony-aware Induction of Binary Latent Representations from Typological Features

Yugo Murawaki


Abstract
Although features of linguistic typology are a promising alternative to lexical evidence for tracing evolutionary history of languages, a large number of missing values in the dataset pose serious difficulties for statistical modeling. In this paper, we combine two existing approaches to the problem: (1) the synchronic approach that focuses on interdependencies between features and (2) the diachronic approach that exploits phylogenetically- and/or spatially-related languages. Specifically, we propose a Bayesian model that (1) represents each language as a sequence of binary latent parameters encoding inter-feature dependencies and (2) relates a language’s parameters to those of its phylogenetic and spatial neighbors. Experiments show that the proposed model recovers missing values more accurately than others and that induced representations retain phylogenetic and spatial signals observed for surface features.
Anthology ID:
I17-1046
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
451–461
Language:
URL:
https://aclanthology.org/I17-1046/
DOI:
Bibkey:
Cite (ACL):
Yugo Murawaki. 2017. Diachrony-aware Induction of Binary Latent Representations from Typological Features. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 451–461, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Diachrony-aware Induction of Binary Latent Representations from Typological Features (Murawaki, IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-1046.pdf
Note:
 I17-1046.Notes.pdf