Rahul Khurana
2022
AniMOJity:Detecting Hate Comments in Indic languages and Analysing Bias against Content Creators
Rahul Khurana
|
Chaitanya Pandey
|
Priyanshi Gupta
|
Preeti Nagrath
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Online platforms have dramatically changed how people communicate with one another, resulting in a 467 million increase in the number of Indians actively exchanging and distributing social data. This caused an unexpected rise in harmful, racially, sexually, and religiously biased Internet content humans cannot control. As a result, there is an urgent need to research automated computational strategies for identifying hostile content in academic forums. This paper presents our learning pipeline and novel model, which classifies a multilingual text with a test f1-Score of 88.6% on the Moj Multilingual Abusive Comment Identification dataset for hate speech detection in thirteen Indian regional languages. Our model, Animojity, incorporates transfer learning and SOTA pre- and post-processing techniques. We manually annotate 300 samples to investigate bias and provide insight into the hate towards creators.