Arturo Martínez Peguero
2024
Applying Linguistic Expertise to LLMs for Educational Material Development in Indigenous Languages
Justin Vasselli
|
Arturo Martínez Peguero
|
Junehwan Sung
|
Taro Watanabe
Proceedings of the 4th Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)
This paper presents our approach to the AmericasNLP 2024 Shared Task 2 as the JAJ (/dʒæz/) team. The task aimed at creating educational materials for indigenous languages, and we focused on Maya and Bribri. Given the unique linguistic features and challenges of these languages, and the limited size of the training datasets, we developed a hybrid methodology combining rule-based NLP methods with prompt-based techniques. This approach leverages the meta-linguistic capabilities of large language models, enabling us to blend broad, language-agnostic processing with customized solutions. Our approach lays a foundational framework that can be expanded to other indigenous languages languages in future work.