LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier

Samia Touileb


Abstract
This paper presents our results for the Nuanced Arabic Dialect Identification (NADI) shared task of the Fifth Workshop for Arabic Natural Language Processing (WANLP 2020). We participated in the first sub-task for country-level Arabic dialect identification covering 21 Arab countries. Our contribution is based on a stacking classifier using Multinomial Naive Bayes, Linear SVC, and Logistic Regression classifiers as estimators; followed by a Logistic Regression as final estimator. Despite the fact that the results on the test set were low, with a macro F1 of 17.71, we were able to show that a simple approach can achieve comparable results to more sophisticated solutions. Moreover, the insights of our error analysis, and of the corpus content in general, can be used to develop and improve future systems.
Anthology ID:
2020.wanlp-1.34
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
313–319
Language:
URL:
https://aclanthology.org/2020.wanlp-1.34
DOI:
Bibkey:
Cite (ACL):
Samia Touileb. 2020. LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 313–319, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
LTG-ST at NADI Shared Task 1: Arabic Dialect Identification using a Stacking Classifier (Touileb, WANLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wanlp-1.34.pdf