Advances in dialectal Arabic speech recognition: a study using Twitter to improve Egyptian ASR

Ahmed Ali, Hamdy Mubarak, Stephan Vogel


Abstract
This paper reports results in building an Egyptian Arabic speech recognition system as an example for under-resourced languages. We investigated different approaches to build the system using 10 hours for training the acoustic model, and results for both grapheme system and phoneme system using MADA. The phoneme-based system shows better results than the grapheme-based system. In this paper, we explore the use of tweets written in dialectal Arabic. Using 880K Egyptian tweets reduced the Out Of Vocabulary (OOV) rate from 15.1% to 3.2% and the WER from 59.6% to 44.7%, a relative gain 25% in WER.
Anthology ID:
2014.iwslt-papers.1
Volume:
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers
Month:
December 4-5
Year:
2014
Address:
Lake Tahoe, California
Venue:
IWSLT
SIG:
Publisher:
Note:
Pages:
156–162
Language:
URL:
https://aclanthology.org/2014.iwslt-papers.1
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2014.iwslt-papers.1.pdf