Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach

Raghad Alshaalan, Hend Al-Khalifa


Abstract
With the rise of hate speech phenomena in Twittersphere, significant research efforts have been undertaken to provide automatic solutions for detecting hate speech, varying from simple ma-chine learning models to more complex deep neural network models. Despite that, research works investigating hate speech problem in Arabic are still limited. This paper, therefore, aims to investigate several neural network models based on Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN) to detect hate speech in Arabic tweets. It also evaluates the recent language representation model BERT on the task of Arabic hate speech detection. To conduct our experiments, we firstly built a new hate speech dataset that contains 9,316 annotated tweets. Then, we conducted a set of experiments on two datasets to evaluate four models: CNN, GRU, CNN+GRU and BERT. Our experimental results on our dataset and an out-domain dataset show that CNN model gives the best performance with an F1-score of 0.79 and AUROC of 0.89.
Anthology ID:
2020.wanlp-1.2
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–23
Language:
URL:
https://aclanthology.org/2020.wanlp-1.2
DOI:
Bibkey:
Cite (ACL):
Raghad Alshaalan and Hend Al-Khalifa. 2020. Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 12–23, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach (Alshaalan & Al-Khalifa, WANLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wanlp-1.2.pdf