Arturo Martínez Peguero
2026
Measuring Linguistic Competence of LLMs on Indigenous Languages of the Americas
Justin Vasselli | Arturo Martínez Peguero | Frederikus Hudi | Haruki Sakajo | Taro Watanabe
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Justin Vasselli | Arturo Martínez Peguero | Frederikus Hudi | Haruki Sakajo | Taro Watanabe
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
This paper presents an evaluation framework for probing large language models’ linguistic knowledge of Indigenous languages of the Americas using zero- and few-shot prompting. The framework consists of three tasks: (1) language identification, (2) cloze completion of Spanish sentences supported by Indigenous-language translations, and (3) grammatical feature classification. We evaluate models from five major families (GPT, Gemini, DeepSeek, Qwen, and LLaMA) on 13 Indigenous languages, including Bribri, Guarani, and Nahuatl. The results show substantial variation across both languages and model families. While a small number of model-language combinations demonstrate consistently stronger performance across tasks, many others perform near chance, highlighting persistent gaps in current models’ abilities on Indigenous languages.
2025
Leveraging Dictionaries and Grammar Rules for the Creation of Educational Materials for Indigenous Languages
Justin Vasselli | Haruki Sakajo | Arturo Martínez Peguero | Frederikus Hudi | Taro Watanabe
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
Justin Vasselli | Haruki Sakajo | Arturo Martínez Peguero | Frederikus Hudi | Taro Watanabe
Proceedings of the Fifth Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)
This paper describes the NAIST submission to the AmericasNLP 2025 shared task on the creation of educational materials for Indigenous languages. We implement three systems to tackle the unique challenges of each language. The first system, used for Maya and Guarani, employs a straightforward GPT-4o few-shot prompting technique, enhanced by synthetically generated examples to ensure coverage of all grammatical variations encountered. The second system, used for Bribri, integrates dictionary-based alignment and linguistic rules to systematically manage linguisticand lexical transformations. Finally, we developed a specialized rule-based system for Nahuatl that systematically reduces sentences to their base form, simplifying the generation of correct morphology variants.
2024
Applying Linguistic Expertise to LLMs for Educational Material Development in Indigenous Languages
Justin Vasselli | Arturo Martínez Peguero | Junehwan Sung | Taro Watanabe
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)
Justin Vasselli | Arturo Martínez Peguero | Junehwan Sung | Taro Watanabe
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)
This paper presents our approach to the AmericasNLP 2024 Shared Task 2 as the JAJ (/dʒæz/) team. The task aimed at creating educational materials for indigenous languages, and we focused on Maya and Bribri. Given the unique linguistic features and challenges of these languages, and the limited size of the training datasets, we developed a hybrid methodology combining rule-based NLP methods with prompt-based techniques. This approach leverages the meta-linguistic capabilities of large language models, enabling us to blend broad, language-agnostic processing with customized solutions. Our approach lays a foundational framework that can be expanded to other indigenous languages languages in future work.