Advances in dialectal Arabic speech recognition: a study using Twitter to improve Egyptian ASR

Ahmed Ali, Hamdy Mubarak, Stephan Vogel


Abstract
This paper reports results in building an Egyptian Arabic speech recognition system as an example for under-resourced languages. We investigated different approaches to build the system using 10 hours for training the acoustic model, and results for both grapheme system and phoneme system using MADA. The phoneme-based system shows better results than the grapheme-based system. In this paper, we explore the use of tweets written in dialectal Arabic. Using 880K Egyptian tweets reduced the Out Of Vocabulary (OOV) rate from 15.1% to 3.2% and the WER from 59.6% to 44.7%, a relative gain 25% in WER.
Anthology ID:
2014.iwslt-papers.1
Volume:
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers
Month:
December 4-5
Year:
2014
Address:
Lake Tahoe, California
Editors:
Marcello Federico, Sebastian Stüker, François Yvon
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
156–162
Language:
URL:
https://aclanthology.org/2014.iwslt-papers.1
DOI:
Bibkey:
Cite (ACL):
Ahmed Ali, Hamdy Mubarak, and Stephan Vogel. 2014. Advances in dialectal Arabic speech recognition: a study using Twitter to improve Egyptian ASR. In Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, pages 156–162, Lake Tahoe, California.
Cite (Informal):
Advances in dialectal Arabic speech recognition: a study using Twitter to improve Egyptian ASR (Ali et al., IWSLT 2014)
Copy Citation:
PDF:
https://aclanthology.org/2014.iwslt-papers.1.pdf