Arabic Dialect Identification Using BERT-Based Domain Adaptation

Ahmad Beltagy, Abdelrahman Abouelenin, Omar ElSherief


Abstract
Arabic is one of the most important and growing languages in the world. With the rise of the social media giants like Twitter, Arabic spoken dialects have become more in use. In this paper we describe our effort and simple approach on the NADI Shared Task 1 that requires us to build a system to differentiate between different 21 Arabic dialects, we introduce a deep learning semisupervised fashion approach along with pre-processing that was reported on NADI shared Task 1 Corpus. Our system ranks 4th in NADI’s shared task competition achieving 23.09% F1 macro average score with a very simple yet an efficient approach on differentiating between 21 Arabic Dialects given tweets.
Anthology ID:
2020.wanlp-1.26
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
262–267
Language:
URL:
https://aclanthology.org/2020.wanlp-1.26
DOI:
Bibkey:
Cite (ACL):
Ahmad Beltagy, Abdelrahman Abouelenin, and Omar ElSherief. 2020. Arabic Dialect Identification Using BERT-Based Domain Adaptation. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 262–267, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Arabic Dialect Identification Using BERT-Based Domain Adaptation (Beltagy et al., WANLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wanlp-1.26.pdf