A corpus of K’iche’ annotated for morphosyntactic structure

Francis Tyers, Robert Henderson


Abstract
This article describes a collection of sentences in K’iche’ annotated for morphology and syntax. K’iche’ is a language in the Mayan language family, spoken in Guatemala. The annotation is done according to the guidelines of the Universal Dependencies project. The corpus consists of a total of 1,433 sentences containing approximately 10,000 tokens and is released under a free/open-source licence. We present a comparison of parsing systems for K’iche’ using this corpus and describe how it can be used for mining linguistic examples.
Anthology ID:
2021.americasnlp-1.2
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Editors:
Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–20
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.2
DOI:
10.18653/v1/2021.americasnlp-1.2
Bibkey:
Cite (ACL):
Francis Tyers and Robert Henderson. 2021. A corpus of K’iche’ annotated for morphosyntactic structure. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 10–20, Online. Association for Computational Linguistics.
Cite (Informal):
A corpus of K’iche’ annotated for morphosyntactic structure (Tyers & Henderson, AmericasNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.americasnlp-1.2.pdf