Predictive Text for Agglutinative and Polysynthetic Languages

Sergey Kosyak, Francis Tyers


Abstract
This paper presents a set of experiments in the area of morphological modelling and prediction. We test whether morphological segmentation can compete against statistical segmentation in the tasks of language modelling and predictive text entry for two under-resourced and indigenous languages, K’iche’ and Chukchi. We use different segmentation methods — both statistical and morphological — to make datasets that are used to train models of different types: single-way segmented, which are trained using data from one segmenter; two-way segmented, which are trained using concatenated data from two segmenters; and finetuned, which are trained on two datasets from different segmenters. We compute word and character level perplexities and find that single-way segmented models trained on morphologically segmented data show the highest performance. Finally, we evaluate the language models on the task of predictive text entry using gold standard data and measure the average number of clicks per character and keystroke savings rate. We find that the models trained on morphologically segmented data show better scores, although with substantial room for improvement. At last, we propose the usage of morphological segmentation in order to improve the end-user experience while using predictive text and we plan on testing this assumption by doing end-user evaluation.
Anthology ID:
2022.fieldmatters-1.9
Volume:
Proceedings of the first workshop on NLP applications to field linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
FieldMatters
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
77–85
Language:
URL:
https://aclanthology.org/2022.fieldmatters-1.9
DOI:
Bibkey:
Cite (ACL):
Sergey Kosyak and Francis Tyers. 2022. Predictive Text for Agglutinative and Polysynthetic Languages. In Proceedings of the first workshop on NLP applications to field linguistics, pages 77–85, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):
Predictive Text for Agglutinative and Polysynthetic Languages (Kosyak & Tyers, FieldMatters 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.fieldmatters-1.9.pdf