Youssef Fares


2019

pdf bib
Arabic Dialect Identification with Deep Learning and Hybrid Frequency Based Features
Youssef Fares | Zeyad El-Zanaty | Kareem Abdel-Salam | Muhammed Ezzeldin | Aliaa Mohamed | Karim El-Awaad | Marwan Torki
Proceedings of the Fourth Arabic Natural Language Processing Workshop

Studies on Dialectical Arabic are growing more important by the day as it becomes the primary written and spoken form of Arabic online in informal settings. Among the important problems that should be explored is that of dialect identification. This paper reports different techniques that can be applied towards such goal and reports their performance on the Multi Arabic Dialect Applications and Resources (MADAR) Arabic Dialect Corpora. Our results show that improving on traditional systems using frequency based features and non deep learning classifiers is a challenging task. We propose different models based on different word and document representations. Our top model is able to achieve an F1 macro averaged score of 65.66 on MADAR’s small-scale parallel corpus of 25 dialects and Modern Standard Arabic (MSA).