Word Representation Models for Arabic Dialect Identification

Mahmoud Sobhy; Ahmed H. Abu El-Atta; Ahmed A. El-Sawy; Hamada Nayel

doi:10.18653/v1/2022.wanlp-1.52

Word Representation Models for Arabic Dialect Identification

Mahmoud Sobhy, Ahmed H. Abu El-Atta, Ahmed A. El-Sawy, Hamada Nayel

Abstract

This paper describes the systems submitted by BFCAI team to Nuanced Arabic Dialect Identification (NADI) shared task 2022. Dialect identification task aims at detecting the source variant of a given text or speech segment automatically. There are two subtasks in NADI 2022, the first subtask for country-level identification and the second subtask for sentiment analysis. Our team participated in the first subtask. The proposed systems use Term Frequency Inverse/Document Frequency and word embeddings as vectorization models. Different machine learning algorithms have been used as classifiers. The proposed systems have been tested on two test sets: Test-A and Test-B. The proposed models achieved Macro-f1 score of 21.25% and 9.71% for Test-A and Test-B set respectively. On other hand, the best-performed submitted system achieved Macro-f1 score of 36.48% and 18.95% for Test-A and Test-B set respectively.

Anthology ID:: 2022.wanlp-1.52
Volume:: Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:: WANLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 474–478
Language:
URL:: https://aclanthology.org/2022.wanlp-1.52/
DOI:: 10.18653/v1/2022.wanlp-1.52
Bibkey:
Cite (ACL):: Mahmoud Sobhy, Ahmed H. Abu El-Atta, Ahmed A. El-Sawy, and Hamada Nayel. 2022. Word Representation Models for Arabic Dialect Identification. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 474–478, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Word Representation Models for Arabic Dialect Identification (Sobhy et al., WANLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wanlp-1.52.pdf

PDF Cite Search Fix data