Aakash Singh
2025
Hope_for_best@LT-EDI 2025: Detecting Racial Hoaxes in Code-Mixed Hindi-English Social Media Data using a multi-phase fine-tuning strategy
Abhishek Singh Yadav
|
Deepawali Sharma
|
Aakash Singh
|
Vivek Kumar Singh
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
In the age of digital communication, social media platforms have become a medium for the spread of misinformation, with racial hoaxes posing a particularly insidious threat. These hoaxes falsely associate individuals or communities with crimes or misconduct, perpetuating harmful stereotypes and inflaming societal tensions. This paper describes the team “Hope_for_best” submission that addresses the challenge of detecting racial hoaxes in codemixed Hindi-English (Hinglish) social media content and secured the 2nd rank in the shared task (Chakravarthi et al., 2025). To address this challenge, the study employs the HoaxMix Plus dataset, developed by LT-EDI 2025, and adopts a multi-phase fine-tuning strategy. Initially, models are sensitized using the THAR dataset—targeted hate speech against religion (Sharma et al., 2024) —to adjust weights toward contextually relevant biases. Further fine-tuning was performed on the HoaxMix Plus dataset. This work employed data balancing sampling strategies to mitigate class imbalance. Among the evaluated models, Hing BERT achieved the highest macro F1-score of 73% demonstrating promising capabilities in detecting racially charged misinformation in code-mixed Hindi-English texts.