What is it? Towards a Generalizable Native American Language Identification System

Ivory Yang; Weicheng Ma; Carlos Guerrero Alvarez; William Dinauer; Soroush Vosoughi

doi:10.18653/v1/2025.naacl-srw.10

What is it? Towards a Generalizable Native American Language Identification System

Ivory Yang, Weicheng Ma, Carlos Guerrero Alvarez, William Dinauer, Soroush Vosoughi

Abstract

This paper presents a research thesis proposal to develop a generalizable Native American language identification system. Despite their cultural and historical significance, Native American languages remain entirely unsupported by major commercial language identification systems. This omission not only underscores the systemic neglect of endangered languages in technological development, but also highlights the urgent need for dedicated, community-driven solutions. We propose a two-pronged approach: (1) systematically curating linguistic resources across all Native American languages for robust training, and (2) tailored data augmentation to generate synthetic yet linguistically coherent training samples. As proof of concept, we extend an existing rudimentary Athabaskan language classifier by integrating Plains Apache, an extinct Southern Athabaskan language, as an additional language class. We also adapt a data generation framework for low-resource languages to create synthetic Plains Apache data, highlighting the potential of data augmentation. This proposal advocates for a community-driven, technological approach to supporting Native American languages.

Anthology ID:: 2025.naacl-srw.10
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–111
Language:
URL:: https://aclanthology.org/2025.naacl-srw.10/
DOI:: 10.18653/v1/2025.naacl-srw.10
Bibkey:
Cite (ACL):: Ivory Yang, Weicheng Ma, Carlos Guerrero Alvarez, William Dinauer, and Soroush Vosoughi. 2025. What is it? Towards a Generalizable Native American Language Identification System. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 105–111, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: What is it? Towards a Generalizable Native American Language Identification System (Yang et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-srw.10.pdf

PDF Cite Search Fix data