Country-level Arabic Dialect Identification Using Small Datasets with Integrated Machine Learning Techniques and Deep Learning Models

Maha J. Althobaiti


Abstract
Arabic is characterised by a considerable number of varieties including spoken dialects. In this paper, we presented our models developed to participate in the NADI subtask 1.2 that requires building a system to distinguish between 21 country-level dialects. We investigated several classical machine learning approaches and deep learning models using small datasets. We examined an integration technique between two machine learning approaches. Additionally, we created dictionaries automatically based on Pointwise Mutual Information and labelled datasets, which enriched the feature space when training models. A semi-supervised learning approach was also examined and compared to other methods that exploit large unlabelled datasets, such as building pre-trained word embeddings. Our winning model was the Support Vector Machine with dictionary-based features and Pointwise Mutual Information values, achieving an 18.94% macros-average F1-score.
Anthology ID:
2021.wanlp-1.30
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Editors:
Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
265–270
Language:
URL:
https://aclanthology.org/2021.wanlp-1.30
DOI:
Bibkey:
Cite (ACL):
Maha J. Althobaiti. 2021. Country-level Arabic Dialect Identification Using Small Datasets with Integrated Machine Learning Techniques and Deep Learning Models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 265–270, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
Country-level Arabic Dialect Identification Using Small Datasets with Integrated Machine Learning Techniques and Deep Learning Models (Althobaiti, WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.30.pdf