Parsing Nheengatu: Performance Gains for a Brazilian Indigenous Universal Dependencies Treebank

Dominick Maia Alexandre; Leonel Figueiredo de Alencar

Parsing Nheengatu: Performance Gains for a Brazilian Indigenous Universal Dependencies Treebank

Dominick Maia Alexandre, Leonel Figueiredo de Alencar

Abstract

This paper evaluates the impact of expanding the UD_Nheengatu-CompLin treebank on parsing performance for Nheengatu, a Brazilian endangered Indigenous language. We hypothesized that the inclusion of annotated data would result in a 10% improvement in the Labeled Attachment Score (LAS). To test this hypothesis, we conducted a 10-fold cross-validation experiment using UDPipe 1.4 under two conditions: parsing with gold tokenization and gold tags, and automatic parsing from raw text. Statistical significance was determined using the Mann-Whitney U test. Although the expected gain was not achieved, the results show improvements in parsing accuracy and reduced variance across folds. The findings highlight the importance of corpus expansion and standardized annotation workflows for improving parsing performance in low-resource language scenarios and for supporting reproducible evaluation methods in the computational modeling of minority languages.

Anthology ID:: 2026.propor-2.29
Volume:: Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Month:: April
Year:: 2026
Address:: Salvador, Brazil
Editors:: Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:: PROPOR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 210–219
Language:
URL:: https://aclanthology.org/2026.propor-2.29/
DOI:
Bibkey:
Cite (ACL):: Dominick Maia Alexandre and Leonel Figueiredo de Alencar. 2026. Parsing Nheengatu: Performance Gains for a Brazilian Indigenous Universal Dependencies Treebank. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2, pages 210–219, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):: Parsing Nheengatu: Performance Gains for a Brazilian Indigenous Universal Dependencies Treebank (Alexandre & Alencar, PROPOR 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.propor-2.29.pdf

PDF Cite Search Fix data