Hinterwelt@LT-EDI 2025: A Transformer-Based Detection of Caste and Migration Hate Speech in Tamil Social Media

Md. Al Amin, Sabik Aftahee, Md. Abdur Rahman, Md. Sajid Hossain Khan, Md. Ashiqur Rahman


Abstract
This paper presents our system for detecting caste and migration-related hate speech in Tamil social media comments, addressing the challenges in this low-resource language setting. We experimented with multiple approaches on a dataset of 7,875 annotated comments. Our methodology encompasses traditional machine learning classifiers (SVM, Random Forest, KNN), deep learning models (CNN, CNN-BiLSTM), and transformer-based architectures (MuRIL, IndicBERT, XLM-RoBERTa). Comprehensive evaluations demonstrate that transformer-based models substantially outperform traditional approaches, with MuRIL-large achieving the highest performance with a macro F1 score of 0.8092. Error analysis reveals challenges in detecting implicit and culturally-specific hate speech expressions requiring deeper socio-cultural context. Our team ranked 5th in the LT-EDI@LDK 2025 shared task with an F1 score of 0.80916. This work contributes to combating harmful online content in low-resource languages and highlights the effectiveness of large pre-trained multilingual models for nuanced text classification tasks.
Anthology ID:
2025.ltedi-1.24
Volume:
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
September
Year:
2025
Address:
Naples, Italy
Editors:
Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:
LTEDI | WS
SIG:
Publisher:
Unior Press
Note:
Pages:
140–145
Language:
URL:
https://aclanthology.org/2025.ltedi-1.24/
DOI:
Bibkey:
Cite (ACL):
Md. Al Amin, Sabik Aftahee, Md. Abdur Rahman, Md. Sajid Hossain Khan, and Md. Ashiqur Rahman. 2025. Hinterwelt@LT-EDI 2025: A Transformer-Based Detection of Caste and Migration Hate Speech in Tamil Social Media. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 140–145, Naples, Italy. Unior Press.
Cite (Informal):
Hinterwelt@LT-EDI 2025: A Transformer-Based Detection of Caste and Migration Hate Speech in Tamil Social Media (Amin et al., LTEDI 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ltedi-1.24.pdf