Simple vs Oversampling-based Classification Methods for Fine Grained Arabic Dialect Identification in Twitter

Mohamed Lichouri, Mourad Abbas


Abstract
In this paper, we present a description of our experiments on country-level Arabic dialect identification. A comparison study between a set of classifiers has been carried out. The best results were achieved using the Linear Support Vector Classification (LSVC) model by applying a Random Over Sampling (ROS) process yielding an F1-score of 18.74% in the post-evaluation phase. In the evaluation phase, our best submitted system has achieved an F1-score of 18.27%, very close to the average F1-score (18.80%) obtained for all the submitted systems.
Anthology ID:
2020.wanlp-1.24
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
250–256
Language:
URL:
https://aclanthology.org/2020.wanlp-1.24
DOI:
Bibkey:
Cite (ACL):
Mohamed Lichouri and Mourad Abbas. 2020. Simple vs Oversampling-based Classification Methods for Fine Grained Arabic Dialect Identification in Twitter. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 250–256, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Simple vs Oversampling-based Classification Methods for Fine Grained Arabic Dialect Identification in Twitter (Lichouri & Abbas, WANLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wanlp-1.24.pdf