Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying

Nazia Nafis, Diptesh Kanojia, Naveen Saini, Rudra Murthy


Abstract
Cyberbullying is a serious societal issue widespread on various channels and platforms, particularly social networking sites. Such platforms have proven to be exceptionally fertile grounds for such behavior. The dearth of high-quality training data for multilingual and low-resource scenarios, data that can accurately capture the nuances of social media conversations, often poses a roadblock to this task. This paper attempts to tackle cyberbullying, specifically its two most common manifestations - aggression and offensiveness. We present a novel, manually annotated dataset of a total of 10,000 English and Hindi-English code-mixed tweets, manually annotated for aggression detection and offensive language detection tasks. Our annotations are supported by inter-annotator agreement scores of 0.67 and 0.74 for the two tasks, indicating substantial agreement. We perform comprehensive fine-tuning of pre-trained language models (PTLMs) using this dataset to check its efficacy. Our challenging test sets show that the best models achieve macro F1-scores of 67.87 and 65.45 on the two tasks, respectively. Further, we perform cross-dataset transfer learning to benchmark our dataset against existing aggression and offensive language datasets. We also present a detailed quantitative and qualitative analysis of errors in prediction, and with this paper, we publicly release the novel dataset, code, and models.
Anthology ID:
2023.woah-1.3
Volume:
The 7th Workshop on Online Abuse and Harms (WOAH)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Yi-ling Chung, Paul R{\"ottger}, Debora Nozza, Zeerak Talat, Aida Mostafazadeh Davani
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29–41
Language:
URL:
https://aclanthology.org/2023.woah-1.3
DOI:
10.18653/v1/2023.woah-1.3
Bibkey:
Cite (ACL):
Nazia Nafis, Diptesh Kanojia, Naveen Saini, and Rudra Murthy. 2023. Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying. In The 7th Workshop on Online Abuse and Harms (WOAH), pages 29–41, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Towards Safer Communities: Detecting Aggression and Offensive Language in Code-Mixed Tweets to Combat Cyberbullying (Nafis et al., WOAH 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.woah-1.3.pdf