Albert Ventayol-Boada
Also published as: Albert Ventayol-boada
2023
Applications of classification trees for endangered language description: Finite verb morphology in Kolyma Yukaghir
Albert Ventayol-Boada
Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Unsupervised part-of-speech induction for language description: Modeling documentation materials in Kolyma Yukaghir
Albert Ventayol-boada
|
Nathan Roll
|
Simon Todd
Proceedings of the Second Workshop on NLP Applications to Field Linguistics
This study investigates the clustering of words into Part-of-Speech (POS) classes in Kolyma Yukaghir. In grammatical descriptions, lexical items are assigned to POS classes based on their morphological paradigms. Discursively, however, these classes share a fair amount of morphology. In this study, we turn to POS induction to evaluate if classes based on quantification of the distributions in which roots and affixes are used can be useful for language description purposes, and, if so, what those classes might be. We qualitatively compare clusters of roots and affixes based on four different definitions of their distributions. The results show that clustering is more reliable for words that typically bear more morphology. Additionally, the results suggest that the number of POS classes in Kolyma Yukaghir might be smaller than stated in current descriptions. This study thus demonstrates how unsupervised learning methods can provide insights for language description, particularly for highly inflectional languages.
Search