Md. Tahfim Juwel Chowdhury


2025

pdf bib
CUET_N317@LT-EDI2025: Detecting Hate Speech Related to Caste and Migration with Transformer Models
Md. Nur Siddik Ruman | Md. Tahfim Juwel Chowdhury | Hasan Murad
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Language that criticizes threatens, or discriminates against people or groups because of their caste, social rank, or status is known as caste and migration hate speech and it has grown in credibly common on social media. Such speech not only contributes to social disruption and in equity, but it also puts at risk the safety and mental health of the targeted groups. Due to the absence of labeled data, the subtlety of culturally unique insults, and the lack of strong linguistic resources for deep text recognition, it is especially difficult to detect caste and migration hate speech in low-resource Dravidian languages like Tamil. In this work, we address the Caste and Migration Hate Speech Detection task, aiming to automatically classify user-generated content as either hateful or non-hateful. We evaluate a range of approaches, including a traditional TF-IDF-based machine learning pipeline using SVM and Logistic Regression, alongside five transformer-based models: mBERT, XLM-R, MuRIL, Tamil BERT, and Tamilhate-BERT.Among these, the domain-adapted Tamilhate BERT achieved the highest macro-F1 score of 0.88 on the test data, securing 1st place in the Shared Task on Caste and Migration Hate Speech Detection at DravidianLangTech@LT-EDI 2025. Our findings highlight the strong performance of transformer models, particularly those fine-tuned on domain-specific data, in detecting nuanced hate speech in low-resource, code-mixed languages like Tamil.