CUET’s_White_Walkers@LT-EDI 2025: Transformer-Based Model for the Detection of Caste and Migration Hate Speech

Jidan Al Abrar, Md. Mizanur Rahman, Ariful Islam, Md. Mehedi Hasan, Md. Mubasshir Naib, Mohammad Shamsul Arefin


Abstract
Hate speech on social media is an evolving problem, particularly in low-resource languages like Tamil, where traditional hate speech detection approaches remain under developed. In this work, we provide a focused solution for cast and migration-based hate speech detection using Tamil-BERT, a Tamil-specialized pre-trained transformer model. One of the key challenges in hate speech detection is the severe class imbalance in the dataset, with hate speech being the minority class. We solve this using focal loss, a loss function that gives more importance to harder-to-classify examples, improving the performance of the model in detecting minority classes. We train our model on a publicly available labeled dataset of Tamil text as hate and non-hate speech. Under strict evaluation, our approach achieves impressive results, outperforming baseline models by a considerable margin. The model achieves an F1 score of 0.8634 and good precision, recall, and accuracy, making it a robust solution for hate speech detection in Tamil. The results show that fine-tuning transformer-based models like Tamil-BERT, coupled with techniques like focal loss, can substantially improve performance in hate speech detection for low-resource languages. This work is a contribution to this growing amount of research and provides insights on how to tackle class imbalance for NLP tasks.
Anthology ID:
2025.ltedi-1.12
Volume:
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
September
Year:
2025
Address:
Naples, Italy
Editors:
Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:
LTEDI | WS
SIG:
Publisher:
Unior Press
Note:
Pages:
75–79
Language:
URL:
https://aclanthology.org/2025.ltedi-1.12/
DOI:
Bibkey:
Cite (ACL):
Jidan Al Abrar, Md. Mizanur Rahman, Ariful Islam, Md. Mehedi Hasan, Md. Mubasshir Naib, and Mohammad Shamsul Arefin. 2025. CUET’s_White_Walkers@LT-EDI 2025: Transformer-Based Model for the Detection of Caste and Migration Hate Speech. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 75–79, Naples, Italy. Unior Press.
Cite (Informal):
CUET’s_White_Walkers@LT-EDI 2025: Transformer-Based Model for the Detection of Caste and Migration Hate Speech (Abrar et al., LTEDI 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ltedi-1.12.pdf