Sebsibe H/Mariam

Also published as: Sebsibe H/mariam


Speech Recognition for Tigrinya language Using Deep Neural Network Approach
Hafte Abera | Sebsibe H/mariam
Proceedings of the 2019 Workshop on Widening NLP

This work presents a speech recognition model for Tigrinya language .The Deep Neural Network is used to make the recognition model. The Long Short-Term Memory Network (LSTM), which is a special kind of Recurrent Neural Network composed of Long Short-Term Memory blocks, is the primary layer of our neural network model. The 40-dimensional features are MFCC-LDA-MLLT-fMLLR with CMN were used. The acoustic models are trained on features that are obtained by projecting down to 40 dimensions using linear discriminant analysis (LDA). Moreover, speaker adaptive training (SAT) is done using a single feature-space maximum likelihood linear regression (FMLLR) transform estimated per speaker. We train and compare LSTM and DNN models at various numbers of parameters and configurations. We show that LSTM models converge quickly and give state of the art speech recognition performance for relatively small sized models. Finally, the accuracy of the model is evaluated based on the recognition rate.


pdf bib
Design of a Tigrinya Language Speech Corpus for Speech Recognition
Hafte Abera | Sebsibe H/Mariam
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

In this paper, we describe the first Tigrinya Languages speech corpora designed and development for speech recognition purposes. Tigrinya, often written as Tigrigna (ትግርኛ) /tɪˈɡrinjə/ belongs to the Semitic branch of the Afro-Asiatic languages where it shows the characteristic features of a Semitic language. It is spoken by ethnic Tigray-Tigrigna people in the Horn of Africa. The paper outlines different corpus designing process analysis of related work on speech corpora creation for different languages. The authors provide also procedures that were used for the creation of Tigrinya speech recognition corpus which is the under-resourced language. One hundred and thirty speakers, native to Tigrinya language, were recorded for training and test dataset set. Each speaker read 100 texts, which consisted of syllabically rich and balanced sentences. Ten thousand sets of sentences were used to prompt sheets. These sentences contained all of the contextual syllables and phones.