OSACT4 Shared Tasks: Ensembled Stacked Classification for Offensive and Hate Speech in Arabic Tweets

Hafiz Hassaan Saeed, Toon Calders, Faisal Kamiran


Abstract
In this paper, we describe our submission for the OCAST4 2020 shared tasks on offensive language and hate speech detection in the Arabic language. Our solution builds upon combining a number of deep learning models using pre-trained word vectors. To improve the word representation and increase word coverage, we compare a number of existing pre-trained word embeddings and finally concatenate the two empirically best among them. To avoid under- as well as over-fitting, we train each deep model multiple times, and we include the optimization of the decision threshold into the training process. The predictions of the resulting models are then combined into a tuned ensemble by stacking a classifier on top of the predictions by these base models. We name our approach “ESOTP” (Ensembled Stacking classifier over Optimized Thresholded Predictions of multiple deep models). The resulting ESOTP-based system ranked 6th out of 35 on the shared task of Offensive Language detection (sub-task A) and 5th out of 30 on Hate Speech Detection (sub-task B).
Anthology ID:
2020.osact-1.11
Volume:
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Hend Al-Khalifa, Walid Magdy, Kareem Darwish, Tamer Elsayed, Hamdy Mubarak
Venue:
OSACT
SIG:
Publisher:
European Language Resource Association
Note:
Pages:
71–75
Language:
English
URL:
https://aclanthology.org/2020.osact-1.11
DOI:
Bibkey:
Cite (ACL):
Hafiz Hassaan Saeed, Toon Calders, and Faisal Kamiran. 2020. OSACT4 Shared Tasks: Ensembled Stacked Classification for Offensive and Hate Speech in Arabic Tweets. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 71–75, Marseille, France. European Language Resource Association.
Cite (Informal):
OSACT4 Shared Tasks: Ensembled Stacked Classification for Offensive and Hate Speech in Arabic Tweets (Saeed et al., OSACT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.osact-1.11.pdf