Sa‘7r: A Saudi Dialect Irony Dataset

Halah AlMazrua, Najla AlHazzani, Amaal AlDawod, Lama AlAwlaqi, Noura AlReshoudi, Hend Al-Khalifa, Luluh AlDhubayi


Abstract
In sentiment analysis, detecting irony is considered a major challenge. The key problem with detecting irony is the difficulty to recognize the implicit and indirect phrases which signifies the opposite meaning. In this paper, we present Sa‘7r ساخرthe Saudi irony dataset, and describe our efforts in constructing it. The dataset was collected using Twitter API and it consists of 19,810 tweets, 8,089 of them are labeled as ironic tweets. We trained several models for irony detection task using machine learning models and deep learning models. The machine learning models include: K-Nearest Neighbor (KNN), Logistic Regression (LR), Support Vector Machine (SVM), and Naïve Bayes (NB). While the deep learning models include BiLSTM and AraBERT. The detection results show that among the tested machine learning models, the SVM outperformed other classifiers with an accuracy of 0.68. On the other hand, the deep learning models achieved an accuracy of 0.66 in the BiLSTM model and 0.71 in the AraBERT model. Thus, the AraBERT model achieved the most accurate result in detecting irony phrases in Saudi Dialect.
Anthology ID:
2022.osact-1.7
Volume:
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Hend Al-Khalifa, Tamer Elsayed, Hamdy Mubarak, Abdulmohsen Al-Thubaity, Walid Magdy, Kareem Darwish
Venue:
OSACT
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
60–70
Language:
URL:
https://aclanthology.org/2022.osact-1.7
DOI:
Bibkey:
Cite (ACL):
Halah AlMazrua, Najla AlHazzani, Amaal AlDawod, Lama AlAwlaqi, Noura AlReshoudi, Hend Al-Khalifa, and Luluh AlDhubayi. 2022. Sa‘7r: A Saudi Dialect Irony Dataset. In Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection, pages 60–70, Marseille, France. European Language Resources Association.
Cite (Informal):
Sa‘7r: A Saudi Dialect Irony Dataset (AlMazrua et al., OSACT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.osact-1.7.pdf