Saral Sainju


2025

pdf bib
SKPD Emergency @ NLU of Devanagari Script Languages 2025: Devanagari Script Classification using CBOW Embeddings with Attention-Enhanced BiLSTM
Shubham Shakya | Saral Sainju | Subham Krishna Shrestha | Prekshya Dawadi | Shreya Khatiwada
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

Devanagari script, encompassing languages such as Nepali, Marathi, Sanskrit, Bhojpuri and Hindi, involves challenges for identification due to its overlapping character sets and lexical characteristics. To address this, we propose a method that utilizes Continuous Bag of Words (CBOW) embeddings integrated with attention-enhanced Bidirectional Long Short-Term Memory (BiLSTM) network. Our methodology involves meticulous data preprocessing and generation of word embeddings to better the model’s ability. The proposed method achieves an overall accuracy of 99%, significantly outperforming character level identification approaches. The results reveal high precision across most language pairs, though minor classification confusions persist between closely related languages. Our findings demonstrate the robustness of the CBOW-BiLSTM model for Devanagari script classification and highlights the importance of accurate language identification in preserving linguistic diversity in multilingual environments. Keywords: Language Identification, Devanagari Script, Natural Language Processing, Neural Networks