ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech

Amritha Nandini K L; Vishal S; Giri Prasath R; Anerud Thiyagarajan; Sachin Kumar S

ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech

Amritha Nandini K L, Vishal S, Giri Prasath R, Anerud Thiyagarajan, Sachin Kumar S

Abstract

Caste and migration hate speech detection is a critical task in the context of increasingly multilingual and diverse online discourse. In this work, we address the problem of identifying hate speech targeting caste and migrant communities across a multilingual social media dataset containing Tamil, Tamil written in English script, and English. We explore and compare different feature representations, including TF-IDF vectors and embeddings from pretrained transformer-based models, to train various machine learning classifiers. Our experiments show that a Soft Voting Classifier that make use of both TF-IDF vectors and MuRIL embeddings performs best, achieving a macro F1 score of 0.802 on the test set. This approach was evaluated as part of the Shared Task on Caste and Migration Hate Speech Detection at LT-EDI@LDK 2025, where it ranked 6th overall.

Anthology ID:: 2025.ltedi-1.15
Volume:: Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:: September
Year:: 2025
Address:: Naples, Italy
Editors:: Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:: LTEDI | WS
SIG:
Publisher:: Unior Press
Note:
Pages:: 90–94
Language:
URL:: https://aclanthology.org/2025.ltedi-1.15/
DOI:
Bibkey:
Cite (ACL):: Amritha Nandini K L, Vishal S, Giri Prasath R, Anerud Thiyagarajan, and Sachin Kumar S. 2025. ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 90–94, Naples, Italy. Unior Press.
Cite (Informal):: ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech (L et al., LTEDI 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.ltedi-1.15.pdf

PDF Cite Search Fix data