EM-26@LT-EDI 2025: Caste and Migration Hate Speech Detection in Tamil-English Code-Mixed Social Media Texts

Tewodros Achamaleh, Tolulope Olalekan Abiola, Mikiyas Mebraihtu, Sara Getachew, Grigori Sidorov


Abstract
In this paper, we describe the system developed by Team EM-26 for the Shared Task on Caste and Migration Hate Speech Detection at LTEDI@LDK 2025. The task addresses the challenge of recognizing caste-based and migration related hate speech in Tamil social media text, a language that is both nuanced and under resourced for machine learning. To tackle this, we fine-tuned the multilingual transformer XLM-RoBERTa-Large on the provided training data, leveraging its cross-lingual strengths to detect both explicit and implicit hate speech. To improve performance, we applied social media focused preprocessing techniques, including Tamil text normalization and noise removal. Our model achieved a macro F1-score of 0.6567 on the test set, highlighting the effectiveness of multilingual transformers for low resource hate speech detection. Additionally, we discuss key challenges and errors in Tamil hate speech classification, which may guide future work toward building more ethical and inclusive AI systems. The source code is available on GitHub.1
Anthology ID:
2025.ltedi-1.26
Volume:
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
September
Year:
2025
Address:
Naples, Italy
Editors:
Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:
LTEDI | WS
SIG:
Publisher:
Unior Press
Note:
Pages:
153–159
Language:
URL:
https://aclanthology.org/2025.ltedi-1.26/
DOI:
Bibkey:
Cite (ACL):
Tewodros Achamaleh, Tolulope Olalekan Abiola, Mikiyas Mebraihtu, Sara Getachew, and Grigori Sidorov. 2025. EM-26@LT-EDI 2025: Caste and Migration Hate Speech Detection in Tamil-English Code-Mixed Social Media Texts. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 153–159, Naples, Italy. Unior Press.
Cite (Informal):
EM-26@LT-EDI 2025: Caste and Migration Hate Speech Detection in Tamil-English Code-Mixed Social Media Texts (Achamaleh et al., LTEDI 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ltedi-1.26.pdf