Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features

Mohamed Lichouri, Mourad Abbas, Khaled Lounnas, Besma Benaziz, Aicha Zitouni


Abstract
In this paper, we analyze the impact of the weighted concatenation of TF-IDF features for the Arabic Dialect Identification task while we participated in the NADI2021 shared task. This study is performed for two subtasks: subtask 1.1 (country-level MSA) and subtask 1.2 (country-level DA) identification. The classifiers supporting our comparative study are Linear Support Vector Classification (LSVC), Linear Regression (LR), Perceptron, Stochastic Gradient Descent (SGD), Passive Aggressive (PA), Complement Naive Bayes (CNB), MutliLayer Perceptron (MLP), and RidgeClassifier. In the evaluation phase, our system gives F1 scores of 14.87% and 21.49%, for country-level MSA and DA identification respectively, which is very close to the average F1 scores achieved by the submitted systems and recorded for both subtasks (18.70% and 24.23%).
Anthology ID:
2021.wanlp-1.33
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
282–286
Language:
URL:
https://aclanthology.org/2021.wanlp-1.33
DOI:
Bibkey:
Cite (ACL):
Mohamed Lichouri, Mourad Abbas, Khaled Lounnas, Besma Benaziz, and Aicha Zitouni. 2021. Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 282–286, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features (Lichouri et al., WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.33.pdf