A survey of part-of-speech tagging approaches applied to K’iche’

Francis Tyers, Nick Howell


Abstract
We study the performance of several popular neural part-of-speech taggers from the Universal Dependencies ecosystem on Mayan languages using a small corpus of 1435 annotated K’iche’ sentences consisting of approximately 10,000 tokens, with encouraging results: F1 scores 93%+ on lemmatisation, part-of-speech and morphological feature assignment. The high performance motivates a cross-language part-of-speech tagging study, where K’iche’-trained models are evaluated on two other Mayan languages, Kaqchikel and Uspanteko: performance on Kaqchikel is good, 63-85%, and on Uspanteko modest, 60-71%. Supporting experiments lead us to conclude the relative diversity of morphological features as a plausible explanation for the limiting factors in cross-language tagging performance, providing some direction for future sentence annotation and collection work to support these and other Mayan languages.
Anthology ID:
2021.americasnlp-1.6
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Venues:
AmericasNLP | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–52
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.6
DOI:
10.18653/v1/2021.americasnlp-1.6
Bibkey:
Cite (ACL):
Francis Tyers and Nick Howell. 2021. A survey of part-of-speech tagging approaches applied to K’iche’. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 44–52, Online. Association for Computational Linguistics.
Cite (Informal):
A survey of part-of-speech tagging approaches applied to K’iche’ (Tyers & Howell, AmericasNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.americasnlp-1.6.pdf