An End-to-End Speech Recognition for the Nepali Language

Sunil Regmi; Bal Krishna Bal

An End-to-End Speech Recognition for the Nepali Language

Abstract

In this era of AI and Deep Learning, Speech Recognition has achieved fairly good levels of accuracy and is bound to change the way humans interact with computers, which happens mostly through texts today. Most of the speech recognition systems for the Nepali language to date use conventional approaches which involve separately trained acoustic, pronunciation and language model components. Creating a pronunciation lexicon from scratch and defining phoneme sets for the language requires expert knowledge, and at the same time is time-consuming. In this work, we present an End-to-End ASR approach, which uses a joint CTC- attention-based encoder-decoder and a Recurrent Neural Network based language modeling which eliminates the need of creating a pronunciation lexicon from scratch. ESPnet toolkit which uses Kaldi Style of data preparation is the framework used for this work. The speech and transcription data used for this research is freely available on the Open Speech and Language Resources (OpenSLR). We use about 159k transcribed speech data to train the speech recognition model which currently recognizes speech input with the CER of 10.3%.

Anthology ID:: 2021.icon-main.22
Volume:: Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2021
Address:: National Institute of Technology Silchar, Silchar, India
Editors:: Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
Venue:: ICON
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 180–185
Language:
URL:: https://aclanthology.org/2021.icon-main.22/
DOI:
Bibkey:
Cite (ACL):: Sunil Regmi and Bal Krishna Bal. 2021. An End-to-End Speech Recognition for the Nepali Language. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 180–185, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):: An End-to-End Speech Recognition for the Nepali Language (Regmi & Bal, ICON 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.icon-main.22.pdf

PDF Cite Search Fix data