A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect

Fatemah Husain, Hana Al-Ostad, Halima Omar


Abstract
Developing a system for sentiment analysis is very challenging for the Arabic language due to the limitations in the available Arabic datasets. Many Arabic dialects are still not studied by researchers in Arabic sentiment analysis due to the complexity of annotators’ recruitment process during dataset creation. This paper covers the research gap in sentiment analysis for the Kuwaiti dialect by proposing a weak supervised approach to develop a large labeled dataset. Our dataset consists of over 16.6k tweets with 7,905 negatives, 7,902 positives, and 860 neutrals that spans several themes and time frames to remove any bias that might affect its content. The annotation agreement between our proposed system’s labels and human-annotated labels reports 93% for the pairwise percent agreement and 0.87 for Cohen’s kappa coefficient. Furthermore, we evaluate our dataset using multiple traditional machine learning classifiers and advanced deep learning language models to test its performance. The results report 89% accuracy when applied to the testing dataset using the ARBERT model.
Anthology ID:
2022.wanlp-1.15
Volume:
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
161–173
Language:
URL:
https://aclanthology.org/2022.wanlp-1.15
DOI:
10.18653/v1/2022.wanlp-1.15
Bibkey:
Cite (ACL):
Fatemah Husain, Hana Al-Ostad, and Halima Omar. 2022. A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 161–173, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect (Husain et al., WANLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wanlp-1.15.pdf