Krishan Chavinda
2025
A Dual Contrastive Learning Framework for Enhanced Hate Speech Detection in Low-Resource Languages
Krishan Chavinda
|
Uthayasanker Thayasivam
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Hate speech on social media platforms is a critical issue, especially in low-resource languages such as Sinhala and Tamil, where the lack of annotated datasets and linguistic tools hampers the development of effective detection systems. This research introduces a novel framework for detecting hate speech in low resource languages by leveraging Multilingual Large Language Models (MLLMs) integrated with a Dual Contrastive Learning (DCL) strategy. Our approach enhances detection by capturing the nuances of hate speech in low-resource settings, applying both self-supervised and supervised contrastive learning techniques. We evaluate our framework using datasets from Facebook and Twitter, demonstrating its superior performance compared to traditional deep learning models like CNN, LSTM, and BiGRU. The results highlight the efficacy of DCL models, particularly when fine-tuned on domain-specific data, with the best performance achieved using the Twitter/twhin-bert-base model. This study underscores the potential of advanced machine learning techniques in improving hate speech detection for under-resourced languages, paving the way for further research in this domain.