CUET_N317@LT-EDI2025: Detecting Hate Speech Related to Caste and Migration with Transformer Models

Md. Nur Siddik Ruman; Md. Tahfim Juwel Chowdhury; Hasan Murad

CUET_N317@LT-EDI2025: Detecting Hate Speech Related to Caste and Migration with Transformer Models

Md. Nur Siddik Ruman, Md. Tahfim Juwel Chowdhury, Hasan Murad

Abstract

Language that criticizes threatens, or discriminates against people or groups because of their caste, social rank, or status is known as caste and migration hate speech and it has grown in credibly common on social media. Such speech not only contributes to social disruption and in equity, but it also puts at risk the safety and mental health of the targeted groups. Due to the absence of labeled data, the subtlety of culturally unique insults, and the lack of strong linguistic resources for deep text recognition, it is especially difficult to detect caste and migration hate speech in low-resource Dravidian languages like Tamil. In this work, we address the Caste and Migration Hate Speech Detection task, aiming to automatically classify user-generated content as either hateful or non-hateful. We evaluate a range of approaches, including a traditional TF-IDF-based machine learning pipeline using SVM and Logistic Regression, alongside five transformer-based models: mBERT, XLM-R, MuRIL, Tamil BERT, and Tamilhate-BERT.Among these, the domain-adapted Tamilhate BERT achieved the highest macro-F1 score of 0.88 on the test data, securing 1st place in the Shared Task on Caste and Migration Hate Speech Detection at DravidianLangTech@LT-EDI 2025. Our findings highlight the strong performance of transformer models, particularly those fine-tuned on domain-specific data, in detecting nuanced hate speech in low-resource, code-mixed languages like Tamil.

Anthology ID:: 2025.ltedi-1.18
Volume:: Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:: September
Year:: 2025
Address:: Naples, Italy
Editors:: Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:: LTEDI | WS
SIG:
Publisher:: Unior Press
Note:
Pages:: 105–110
Language:
URL:: https://aclanthology.org/2025.ltedi-1.18/
DOI:
Bibkey:
Cite (ACL):: Md. Nur Siddik Ruman, Md. Tahfim Juwel Chowdhury, and Hasan Murad. 2025. CUET_N317@LT-EDI2025: Detecting Hate Speech Related to Caste and Migration with Transformer Models. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 105–110, Naples, Italy. Unior Press.
Cite (Informal):: CUET_N317@LT-EDI2025: Detecting Hate Speech Related to Caste and Migration with Transformer Models (Ruman et al., LTEDI 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ltedi-1.18.pdf

PDF Cite Search Fix data