SSN_IT_HATE@LT-EDI-2025: Caste and Migration Hate Speech Detection

Maria Nancy C, Radha N, Swathika R


Abstract
This paper proposes a transformer-based methodology for detecting hate speech in Tamil, developed as part of the shared task on Caste and Migration Hate Speech Detection. Leveraging the multilingual BERT (mBERT) model, we fine-tune it to classify Tamil social media content into caste/migration-related hate speech and non hate speech categories. Our approach achieves a macro F1-score of 0.72462 in the development dataset, demonstrating the effectiveness of multilingual pretrained models in low-resource language settings. The code for this work is available on github Hate-Speech Deduction.
Anthology ID:
2025.ltedi-1.14
Volume:
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
September
Year:
2025
Address:
Naples, Italy
Editors:
Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:
LTEDI | WS
SIG:
Publisher:
Unior Press
Note:
Pages:
84–89
Language:
URL:
https://aclanthology.org/2025.ltedi-1.14/
DOI:
Bibkey:
Cite (ACL):
Maria Nancy C, Radha N, and Swathika R. 2025. SSN_IT_HATE@LT-EDI-2025: Caste and Migration Hate Speech Detection. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 84–89, Naples, Italy. Unior Press.
Cite (Informal):
SSN_IT_HATE@LT-EDI-2025: Caste and Migration Hate Speech Detection (C et al., LTEDI 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ltedi-1.14.pdf