Rahul Thakur


2024

pdf bib
Generalizable Multilingual Hate Speech Detection on Low Resource Indian Languages using Fair Selection in Federated Learning
Akshay Singh | Rahul Thakur
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Social media, originally meant for peaceful communication, now faces issues with hate speech. Detecting hate speech from social media in Indian languages with linguistic diversity and cultural nuances presents a complex and challenging task. Furthermore, traditional methods involve sharing of users’ sensitive data with a server for model training making it undesirable and involving potential risk to their privacy remained under-studied. In this paper, we combined various low-resource language datasets and propose MultiFED, a federated approach that performs effectively to detect hate speech. MultiFED utilizes continuous adaptation and fine-tuning to aid generalization using subsets of multilingual data overcoming the limitations of data scarcity. Extensive experiments are conducted on 13 Indic datasets across five different pre-trained models. The results show that MultiFED outperforms the state-of-the-art baselines by 8% (approx.) in terms of Accuracy and by 12% (approx.) in terms of F-Score.
Search
Co-authors
Venues