Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling

Ayimunishagu Abulimiti, Tanja Schultz


Abstract
Low-resource languages suffer from lower performance of Automatic Speech Recognition (ASR) system due to the lack of data. As a common approach, multilingual training has been applied to achieve more context coverage and has shown better performance over the monolingual training (Heigold et al., 2013). However, the difference between the donor language and the target language may distort the acoustic model trained with multilingual data, especially when much larger amount of data from donor languages is used for training the models of low-resource language. This paper presents our effort towards improving the performance of ASR system for the under-resourced Uyghur language with multilingual acoustic training. For the developing of multilingual speech recognition system for Uyghur, we used Turkish as donor language, which we selected from GlobalPhone corpus as the most similar language to Uyghur. By generating subsets of Uyghur training data, we explored the performance of multilingual speech recognition systems trained with different sizes of Uyghur and Turkish data. The best speech recognition system for Uyghur is achieved by multilingual training using all Uyghur data (10hours) and 17 hours of Turkish data and the WER is 19.17%, which corresponds to 4.95% relative improvement over monolingual training.
Anthology ID:
2020.lrec-1.793
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6444–6449
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.793
DOI:
Bibkey:
Cite (ACL):
Ayimunishagu Abulimiti and Tanja Schultz. 2020. Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6444–6449, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling (Abulimiti & Schultz, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.793.pdf