Nepali Transformers@NLU of Devanagari Script Languages 2025: Detection of Language, Hate Speech and Targets

Pilot Khadka, Ankit Bk, Ashish Acharya, Bikram K.c., Sandesh Shrestha, Rabin Thapa


Abstract
The Devanagari script, an Indic script used by a diverse range of South Asian languages, presents a significant challenge in Natural Language Processing (NLP) research. The dialect and language variation, complex script features, and limited language-specific tools make development difficult. This shared task aims to address this challenge by bringing together researchers and practitioners to solve three key problems: Language identification, Hate speech detection, and Targets of Hate speech identification. The selected languages- Hindi, Nepali, Marathi, Sanskrit, and Bhojpuri- are widely used in South Asia and represent distinct linguistic structures. In this work, we explore the effectiveness of both machine-learning models and transformer-based models on all three sub-tasks. Our results demonstrate strong performance of the multilingual transformer model, particularly one pre-trained on domain-specific social media data, across all three tasks. The multilingual RoBERTa model, trained on the Twitter dataset, achieved a remarkable accuracy and F1-score of 99.5% on language identification (Task A), 88.3% and 72.5% on Hate Speech detection (Task B), and 68.6% and 61.8% on Hate Speech Target Classification (Task C).
Anthology ID:
2025.chipsal-1.36
Volume:
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Kengatharaiyer Sarveswaran, Ashwini Vaidya, Bal Krishna Bal, Sana Shams, Surendrabikram Thapa
Venues:
CHiPSAL | WS
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
314–319
Language:
URL:
https://aclanthology.org/2025.chipsal-1.36/
DOI:
Bibkey:
Cite (ACL):
Pilot Khadka, Ankit Bk, Ashish Acharya, Bikram K.c., Sandesh Shrestha, and Rabin Thapa. 2025. Nepali Transformers@NLU of Devanagari Script Languages 2025: Detection of Language, Hate Speech and Targets. In Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025), pages 314–319, Abu Dhabi, UAE. International Committee on Computational Linguistics.
Cite (Informal):
Nepali Transformers@NLU of Devanagari Script Languages 2025: Detection of Language, Hate Speech and Targets (Khadka et al., CHiPSAL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.chipsal-1.36.pdf