End-to-End Automatic Speech Recognition for Gujarati

Deepang Raval, Vyom Pathak, Muktan Patel, Brijesh Bhatt


Abstract
We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning based approach which includes Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory (BiLSTM) layers, Dense layers, and Connectionist Temporal Classification (CTC) as a loss function. In order to improve the performance of the system with the limited size of the dataset, we present a combined language model (WLM and CLM) based prefix decoding technique and Bidirectional Encoder Representations from Transformers (BERT) based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we proposed different analysis methods. These insights help to understand our ASR system based on a particular language (Gujarati) as well as can govern ASR systems’ to improve the performance for low resource languages. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.11% decrease in Word Error Rate (WER) with respect to base-model WER.
Anthology ID:
2020.icon-main.56
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
409–419
Language:
URL:
https://aclanthology.org/2020.icon-main.56
DOI:
Bibkey:
Cite (ACL):
Deepang Raval, Vyom Pathak, Muktan Patel, and Brijesh Bhatt. 2020. End-to-End Automatic Speech Recognition for Gujarati. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 409–419, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
End-to-End Automatic Speech Recognition for Gujarati (Raval et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.56.pdf
Code
 01-vyom/End_2_End_Automatic_Speech_Recognition_For_Gujarati