Bikram K.c.


2025

pdf bib
Nepali Transformers@NLU of Devanagari Script Languages 2025: Detection of Language, Hate Speech and Targets
Pilot Khadka | Ankit Bk | Ashish Acharya | Bikram K.c. | Sandesh Shrestha | Rabin Thapa
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

The Devanagari script, an Indic script used by a diverse range of South Asian languages, presents a significant challenge in Natural Language Processing (NLP) research. The dialect and language variation, complex script features, and limited language-specific tools make development difficult. This shared task aims to address this challenge by bringing together researchers and practitioners to solve three key problems: Language identification, Hate speech detection, and Targets of Hate speech identification. The selected languages- Hindi, Nepali, Marathi, Sanskrit, and Bhojpuri- are widely used in South Asia and represent distinct linguistic structures. In this work, we explore the effectiveness of both machine-learning models and transformer-based models on all three sub-tasks. Our results demonstrate strong performance of the multilingual transformer model, particularly one pre-trained on domain-specific social media data, across all three tasks. The multilingual RoBERTa model, trained on the Twitter dataset, achieved a remarkable accuracy and F1-score of 99.5% on language identification (Task A), 88.3% and 72.5% on Hate Speech detection (Task B), and 68.6% and 61.8% on Hate Speech Target Classification (Task C).