Hope_for_best@LT-EDI 2025: Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data using a multi-phase fine-tuning strategy

Abhishek Singh Yadav, Deepawali Sharma, Aakash Singh, Vivek Kumar Singh


Abstract
In the age of digital communication, social media platforms have become a medium for the spread of misinformation, with racial hoaxes posing a particularly insidious threat. These hoaxes falsely associate individuals or communities with crimes or misconduct, perpetuating harmful stereotypes and inflaming societal tensions. This paper describes the team “Hope_for_best” submission that addresses the challenge of detecting racial hoaxes in codemixed Hindi-English (Hinglish) social media content and secured the 2nd rank in the shared task (Chakravarthi et al., 2025). To address this challenge, the study employs the HoaxMix Plus dataset, developed by LT-EDI 2025, and adopts a multi-phase fine-tuning strategy. Initially, models are sensitized using the THAR dataset—targeted hate speech against religion (Sharma et al., 2024) —to adjust weights toward contextually relevant biases. Further fine-tuning was performed on the HoaxMix Plus dataset. This work employed data balancing sampling strategies to mitigate class imbalance. Among the evaluated models, Hing BERT achieved the highest macro F1-score of 73% demonstrating promising capabilities in detecting racially charged misinformation in code-mixed Hindi-English texts.
Anthology ID:
2025.ltedi-1.7
Volume:
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
September
Year:
2025
Address:
Naples, Italy
Editors:
Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
Venues:
LTEDI | WS
SIG:
Publisher:
Unior Press
Note:
Pages:
39–46
Language:
URL:
https://aclanthology.org/2025.ltedi-1.7/
DOI:
Bibkey:
Cite (ACL):
Abhishek Singh Yadav, Deepawali Sharma, Aakash Singh, and Vivek Kumar Singh. 2025. Hope_for_best@LT-EDI 2025: Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data using a multi-phase fine-tuning strategy. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 39–46, Naples, Italy. Unior Press.
Cite (Informal):
Hope_for_best@LT-EDI 2025: Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data using a multi-phase fine-tuning strategy (Yadav et al., LTEDI 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ltedi-1.7.pdf