Latent Geographical Factors for Analyzing the Evolution of Dialects in Contact

Yugo Murawaki


Abstract
Analyzing the evolution of dialects remains a challenging problem because contact phenomena hinder the application of the standard tree model. Previous statistical approaches to this problem resort to admixture analysis, where each dialect is seen as a mixture of latent ancestral populations. However, such ancestral populations are hardly interpretable in the context of the tree model. In this paper, we propose a probabilistic generative model that represents latent factors as geographical distributions. We argue that the proposed model has higher affinity with the tree model because a tree can alternatively be represented as a set of geographical distributions. Experiments involving synthetic and real data suggest that the proposed method is both quantitatively and qualitatively superior to the admixture model.
Anthology ID:
2020.emnlp-main.69
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
959–976
Language:
URL:
https://aclanthology.org/2020.emnlp-main.69
DOI:
10.18653/v1/2020.emnlp-main.69
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.69.pdf
Video:
 https://slideslive.com/38938993