Offensive Language Detection in Nepali Social Media

Nobal B. Niraula, Saurab Dulal, Diwa Koirala


Abstract
Social media texts such as blog posts, comments, and tweets often contain offensive languages including racial hate speech comments, personal attacks, and sexual harassment. Detecting inappropriate use of language is, therefore, of utmost importance for the safety of the users as well as for suppressing hateful conduct and aggression. Existing approaches to this problem are mostly available for resource-rich languages such as English and German. In this paper, we characterize the offensive language in Nepali, a low-resource language, highlighting the challenges that need to be addressed for processing Nepali social media text. We also present experiments for detecting offensive language using supervised machine learning. Besides contributing the first baseline approaches of detecting offensive language in Nepali, we also release human annotated data sets to encourage future research on this crucial topic.
Anthology ID:
2021.woah-1.7
Volume:
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Aida Mostafazadeh Davani, Douwe Kiela, Mathias Lambert, Bertie Vidgen, Vinodkumar Prabhakaran, Zeerak Waseem
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–75
Language:
URL:
https://aclanthology.org/2021.woah-1.7
DOI:
10.18653/v1/2021.woah-1.7
Bibkey:
Cite (ACL):
Nobal B. Niraula, Saurab Dulal, and Diwa Koirala. 2021. Offensive Language Detection in Nepali Social Media. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), pages 67–75, Online. Association for Computational Linguistics.
Cite (Informal):
Offensive Language Detection in Nepali Social Media (Niraula et al., WOAH 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.woah-1.7.pdf
Video:
 https://aclanthology.org/2021.woah-1.7.mp4
Code
 nowalab/offensive-nepali
Data
Hate Speech and Offensive Language