Chundra Cathcart


2020

pdf bib
Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping
Chundra Cathcart | Taraka Rama
Proceedings of the 24th Conference on Computational Natural Language Learning

This paper seeks to uncover patterns of sound change across Indo-Aryan languages using an LSTM encoder-decoder architecture. We augment our models with embeddings represent-ing language ID, part of speech, and other features such as word embeddings. We find that a highly augmented model shows highest accuracy in predicting held-out forms, and investigate other properties of interest learned by our models’ representations. We outline extensions to this architecture that can better capture variation in Indo-Aryan sound change.

pdf bib
In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology
Chundra Cathcart | Florian Wandl
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in amultilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model out-performs the other two in terms of accuracy, but the Sigmoid model’s language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research.

2019

pdf bib
Toward a deep dialectological representation of Indo-Aryan
Chundra Cathcart
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

This paper presents a new approach to disentangling inter-dialectal and intra-dialectal relationships within one such group, the Indo-Aryan subgroup of Indo-European. We draw upon admixture models and deep generative models to tease apart historic language contact and language-specific behavior in the overall patterns of sound change displayed by Indo-Aryan languages. We show that a “deep” model of Indo-Aryan dialectology sheds some light on questions regarding inter-relationships among the Indo-Aryan languages, and performs better than a “shallow” model in terms of certain qualities of the posterior distribution (e.g., entropy of posterior distributions), and outline future pathways for model development.

pdf bib
Gaussian Process Models of Sound Change in Indo-Aryan Dialectology
Chundra Cathcart
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

This paper proposes a Gaussian Process model of sound change targeted toward questions in Indo-Aryan dialectology. Gaussian Processes (GPs) provide a flexible means of expressing covariance between outcomes, and can be extended to a wide variety of probability distributions. We find that GP models fare better in terms of some key posterior predictive checks than models that do not express covariance between sound changes, and outline directions for future work.