Apurinã Universal Dependencies Treebank

Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney Da Silva Facundes, Mika Hämäläinen, Niko Partanen


Abstract
This paper presents and discusses the first Universal Dependencies treebank for the Apurinã language. The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features — some of which are unique to Apurinã. The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate the transfer of open-source infrastructure possibilities to an endangered language of the Amazon. The source materials used in the initial treebank represent fieldwork practices where not all tokens of all sentences are equally annotated. For this reason, establishing regular annotation practices for the entire Apurinã treebank is an ongoing project.
Anthology ID:
2021.americasnlp-1.4
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Venues:
AmericasNLP | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–33
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.4
DOI:
10.18653/v1/2021.americasnlp-1.4
Bibkey:
Cite (ACL):
Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney Da Silva Facundes, Mika Hämäläinen, and Niko Partanen. 2021. Apurinã Universal Dependencies Treebank. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 28–33, Online. Association for Computational Linguistics.
Cite (Informal):
Apurinã Universal Dependencies Treebank (Rueter et al., AmericasNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.americasnlp-1.4.pdf