Afaan Oromo Hate Speech Detection and Classification on Social Media

Teshome Mulugeta Ababu; Michael Melese Woldeyohannis

Afaan Oromo Hate Speech Detection and Classification on Social Media

Teshome Mulugeta Ababu, Michael Melese Woldeyohannis

Abstract

Hate and offensive speech on social media is targeted to attack an individual or group of community based on protected characteristics such as gender, ethnicity, and religion. Hate and offensive speech on social media is a global problem that suffers the community especially, for an under-resourced language like Afaan Oromo language. One of the most widely spoken Cushitic language families is Afaan Oromo. Our objective is to develop and test a model used to detect and classify Afaan Oromo hate speech on social media. We developed numerous models that were used to detect and classify Afaan Oromo hate speech on social media by using different machine learning algorithms (classical, ensemble, and deep learning) with the combination of different feature extraction techniques such as BOW, TF-IDF, word2vec, and Keras Embedding layers. To perform the task, we required Afaan Oromo datasets, but the datasets were unavailable. By concentrating on four thematic areas of hate speech, such as gender, religion, race, and offensive speech, we were able to collect a total of 12,812 posts and comments from Facebook. BiLSTM with pre-trained word2vec feature extraction is an outperformed algorithm that achieves better accuracy of 0.84 and 0.88 for eight classes and two classes, respectively.

Anthology ID:: 2022.lrec-1.712
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 6612–6619
Language:
URL:: https://aclanthology.org/2022.lrec-1.712/
DOI:
Bibkey:
Cite (ACL):: Teshome Mulugeta Ababu and Michael Melese Woldeyohannis. 2022. Afaan Oromo Hate Speech Detection and Classification on Social Media. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6612–6619, Marseille, France. European Language Resources Association.
Cite (Informal):: Afaan Oromo Hate Speech Detection and Classification on Social Media (Ababu & Woldeyohannis, LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.712.pdf

PDF Cite Search Fix data