Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System

Ghimire Rupak Raj, Bal Bal Krishna, Prasain Balaram, Poudyal Prakash


Abstract
The Automatic Speech Recognition (ASR) has come up with significant advancements over the course of several decades, transitioning from a rule-based method to a statistical approach, and ultimately to the use of end-to-end (E2E) frameworks. This phenomenon continues with the progression of machine learning and deep learning methodologies. The E2E approach for ASR has demonstrated predominant success in the case of resourceful languages with larger annotated corpus. However, the accuracy is quite low for low-resourced languages such as Nepali. In this regard, language-specific tools such as tokenizers seem to play a vital role in improving the performance of the E2E model for low-resourced languages like Nepali. In this paper, we propose a pronunciationaware syllable tokenizer for the Nepali language which improves the results of the E2E model. Our experiment confirm that the introduction of the proposed tokenizer yields better performance with the Character Error Rate (CER) 8.09% compared to other language-independent tokenizers.
Anthology ID:
2023.icon-1.4
Volume:
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2023
Address:
Goa University, Goa, India
Editors:
D. Pawar Jyoti, Lalitha Devi Sobha
Venue:
ICON
SIG:
SIGLEX
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
36–43
Language:
URL:
https://aclanthology.org/2023.icon-1.4
DOI:
Bibkey:
Cite (ACL):
Ghimire Rupak Raj, Bal Bal Krishna, Prasain Balaram, and Poudyal Prakash. 2023. Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 36–43, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):
Pronunciation-Aware Syllable Tokenizer for Nepali Automatic Speech Recognition System (Rupak Raj et al., ICON 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.icon-1.4.pdf