Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping

Chundra Cathcart, Taraka Rama


Abstract
This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture. We augment our models with embeddings represent-ing language ID, part of speech, and other features such as word embeddings. We find that a highly augmented model shows highest accuracy in predicting held-out forms, and investigate other properties of interest learned by our models’ representations. We outline extensions to this architecture that can better capture variation in Indo-Aryan sound change.
Anthology ID:
2020.conll-1.50
Volume:
Proceedings of the 24th Conference on Computational Natural Language Learning
Month:
November
Year:
2020
Address:
Online
Editors:
Raquel Fernández, Tal Linzen
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
620–630
Language:
URL:
https://aclanthology.org/2020.conll-1.50
DOI:
10.18653/v1/2020.conll-1.50
Bibkey:
Cite (ACL):
Chundra Cathcart and Taraka Rama. 2020. Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 620–630, Online. Association for Computational Linguistics.
Cite (Informal):
Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping (Cathcart & Rama, CoNLL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.conll-1.50.pdf
Code
 chundrac/ia-conll-2020