Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features

Youssef Fares; Zeyad El-Zanaty; Kareem Abdel-Salam; Muhammed Ezzeldin; Aliaa Mohamed; Karim El-Awaad; Marwan Torki

doi:10.18653/v1/W19-4626

Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features

Youssef Fares, Zeyad El-Zanaty, Kareem Abdel-Salam, Muhammed Ezzeldin, Aliaa Mohamed, Karim El-Awaad, Marwan Torki

Abstract

Studies on Dialectical Arabic are growing more important by the day as it becomes the primary written and spoken form of Arabic online in informal settings. Among the important problems that should be explored is that of dialect identification. This paper reports different techniques that can be applied towards such goal and reports their performance on the Multi Arabic Dialect Applications and Resources (MADAR) Arabic Dialect Corpora. Our results show that improving on traditional systems using frequency based features and non deep learning classifiers is a challenging task. We propose different models based on different word and document representations. Our top model is able to achieve an F1 macro averaged score of 65.66 on MADAR’s small-scale parallel corpus of 25 dialects and Modern Standard Arabic (MSA).

Anthology ID:: W19-4626
Volume:: Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:: WANLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 224–228
Language:
URL:: https://aclanthology.org/W19-4626/
DOI:: 10.18653/v1/W19-4626
Bibkey:
Cite (ACL):: Youssef Fares, Zeyad El-Zanaty, Kareem Abdel-Salam, Muhammed Ezzeldin, Aliaa Mohamed, Karim El-Awaad, and Marwan Torki. 2019. Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 224–228, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features (Fares et al., WANLP 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-4626.pdf

PDF Cite Search Fix data