Endangered Language Preservation: A Model for Automatic Speech Recognition Based on Khroskyabs Data

Ruiyao Li, Yunfan Lai


Abstract
This is a report on an Automatic Speech Recognition (ASR) experiment conducted using the Khroskyabs data. With the impact of information technology development and globalization challenges on linguistic diversity, this study focuses on the preservation crisis of the endangered Gyalrongic language, particularly the Khroskyabs language. We used Automatic Speech Recognition technology and the Wav2Vec2 model to transcribe the Khroskyabs language. Despite challenges such as data scarcity and the language’s complex morphology, preliminary results show promising character accuracy from the model. Additionally, the linguist also has given relatively high evaluations to the transcription results of our model. Therefore, the experimental and evaluation results demonstrate the high practicality of our model. At the same time, the results also reveal issues with high word error rates, so we plan to augment our existing dataset with additional Khroskyabs data in our further studies. This study provides insights and methodologies for using Automatic Speech Recognition to transcribe and protect Khroskyabs, and we hope that this can contribute to the preservation efforts of other endangered languages.
Anthology ID:
2024.eurali-1.6
Volume:
Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Atul Kr. Ojha, Sina Ahmadi, Silvie Cinková, Theodorus Fransen, Chao-Hong Liu, John P. McCrae
Venues:
EURALI | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
36–40
Language:
URL:
https://aclanthology.org/2024.eurali-1.6
DOI:
Bibkey:
Cite (ACL):
Ruiyao Li and Yunfan Lai. 2024. Endangered Language Preservation: A Model for Automatic Speech Recognition Based on Khroskyabs Data. In Proceedings of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024, pages 36–40, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Endangered Language Preservation: A Model for Automatic Speech Recognition Based on Khroskyabs Data (Li & Lai, EURALI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eurali-1.6.pdf