Not always about you: Prioritizing community needs when developing endangered language technology

Zoey Liu, Crystal Richardson, Richard Hatcher, Emily Prud’hommeaux


Abstract
Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.
Anthology ID:
2022.acl-long.272
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3933–3944
Language:
URL:
https://aclanthology.org/2022.acl-long.272
DOI:
10.18653/v1/2022.acl-long.272
Bibkey:
Cite (ACL):
Zoey Liu, Crystal Richardson, Richard Hatcher, and Emily Prud’hommeaux. 2022. Not always about you: Prioritizing community needs when developing endangered language technology. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3933–3944, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Not always about you: Prioritizing community needs when developing endangered language technology (Liu et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.272.pdf