2024
pdf
bib
abs
Machine Translation Through Cultural Texts: Can Verses and Prose Help Low-Resource Indigenous Models?
Antoine Cadotte
|
Nathalie André
|
Fatiha Sadat
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
We propose the first MT models for Innu-Aimun, an Indigenous language in Eastern Canada, in an effort to provide assistance tools for translation and language learning. This project is carried out in collaboration with an Innu community school and involves the participation of students in Innu-Aimun translation, within the framework of a meaningful consideration of Indigenous perspectives.Our contributions in this paper result from the three initial stages of this project. First, we aim to align bilingual Innu-Aimun/French texts with collaboration and participation of Innu-Aimun locutors. Second, we present the training and evaluation results of the MT models (both statistical and neural) based on these aligned corpora. And third, we collaboratively analyze some of the translations resulting from the MT models.We also see these developments for Innu-Aimun as a useful case study for answering a larger question: in a context where few aligned bilingual sentences are available for an Indigenous language, can cultural texts such as literature and poetry be used in the development of MT models?
2022
pdf
bib
abs
Deep Learning-Based Morphological Segmentation for Indigenous Languages: A Study Case on Innu-Aimun
Ngoc Tan Le
|
Antoine Cadotte
|
Mathieu Boivin
|
Fatiha Sadat
|
Jimena Terraza
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
Recent advances in the field of deep learning have led to a growing interest in the development of NLP approaches for low-resource and endangered languages. Nevertheless, relatively little research, related to NLP, has been conducted on indigenous languages. These languages are considered to be filled with complexities and challenges that make their study incredibly difficult in the NLP and AI fields. This paper focuses on the morphological segmentation of indigenous languages, an extremely challenging task because of polysynthesis, dialectal variations with rich morpho-phonemics, misspellings and resource-limited scenario issues. The proposed approach, towards a morphological segmentation of Innu-Aimun, an extremely low-resource indigenous language of Canada, is based on deep learning. Experiments and evaluations have shown promising results, compared to state-of-the-art rule-based and unsupervised approaches.
pdf
bib
abs
Challenges and Perspectives for Innu-Aimun within Indigenous Language Technologies
Antoine Cadotte
|
Tan Le Ngoc
|
Mathieu Boivin
|
Fatiha Sadat
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Innu-Aimun is an Algonquian language spoken in Eastern Canada. It is the language of the Innu, an Indigenous people that now lives for the most part in a dozen communities across Quebec and Labrador. Although it is alive, Innu-Aimun sees important preservation and revitalization challenges and issues. The state of its technology is still nascent, with very few existing applications. This paper proposes a first survey of the available linguistic resources and existing technology for Innu-Aimun. Considering the existing linguistic and textual resources, we argue that developing language technology is feasible and propose first steps towards NLP applications like machine translation. The goal of developing such technologies is first and foremost to help efforts in improving language transmission and cultural safety and preservation for Innu-Aimun speakers, as those are considered urgent and vital issues. Finally, we discuss the importance of close collaboration and consultation with the Innu community in order to ensure that language technologies are developed respectfully and in accordance with that goal.