IUNADI at NADI 2023 shared task: Country-level Arabic Dialect Classification in Tweets for the Shared Task NADI 2023

Yash Hatekar, Muhammad Abdo


Abstract
In this paper, we describe our participation in the NADI2023 shared task for the classification of Arabic dialects in tweets. For training, evaluation, and testing purposes, a primary dataset comprising tweets from 18 Arab countries is provided, along with three older datasets. The main objective is to develop a model capable of classifying tweets from these 18 countries. We outline our approach, which leverages various machine learning models. Our experiments demonstrate that large language models, particularly Arabertv2-Large, Arabertv2-Base, and CAMeLBERT-Mix DID MADAR, consistently outperform traditional methods such as SVM, XGBOOST, Multinomial Naive Bayes, AdaBoost, and Random Forests.
Anthology ID:
2023.arabicnlp-1.72
Volume:
Proceedings of ArabicNLP 2023
Month:
December
Year:
2023
Address:
Singapore (Hybrid)
Editors:
Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
665–669
Language:
URL:
https://aclanthology.org/2023.arabicnlp-1.72
DOI:
10.18653/v1/2023.arabicnlp-1.72
Bibkey:
Cite (ACL):
Yash Hatekar and Muhammad Abdo. 2023. IUNADI at NADI 2023 shared task: Country-level Arabic Dialect Classification in Tweets for the Shared Task NADI 2023. In Proceedings of ArabicNLP 2023, pages 665–669, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):
IUNADI at NADI 2023 shared task: Country-level Arabic Dialect Classification in Tweets for the Shared Task NADI 2023 (Hatekar & Abdo, ArabicNLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.arabicnlp-1.72.pdf
Video:
 https://aclanthology.org/2023.arabicnlp-1.72.mp4