Enhancing Urdu Sentiment Classification through Instruction-Tuned LLMs and Cross-Lingual Transfer

Hasan Faraz Khan, Noor Fatima, Irfan Ahmad


Abstract
Sentiment analysis in low-resource languages such as Urdu poses unique challenges due to limited annotated data, morphological complexity, and significant class imbalance in most publicly available datasets. This study addresses these issues through two experimental strategies. First, we explore class imbalance mitigation by using instruction-tuned large language models (LLMs) to generate synthetic negative sentiment samples in Urdu. This augmentation strategy results in a more balanced dataset, which significantly improves the recall and F1-score for minority class predictions when fine-tuned using a multilingual BERT model. Second, we investigate the effectiveness of translating Urdu text into English and applying sentiment classification through a pre-trained English language model. Comparative evaluation reveals that the translation-based pipeline, using a RoBERTa model fine-tuned for English sentiment classification, achieves superior performance across major metrics. Our results suggest that LLM-based augmentation and cross-lingual transfer via translation both serve as viable approaches to overcome data scarcity and performance limitations in sentiment analysis for low-resource languages. The findings highlight the potential applicability of these approaches to other under-resourced linguistic domains.
Anthology ID:
2026.abjadnlp-1.28
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
198–207
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.28/
DOI:
Bibkey:
Cite (ACL):
Hasan Faraz Khan, Noor Fatima, and Irfan Ahmad. 2026. Enhancing Urdu Sentiment Classification through Instruction-Tuned LLMs and Cross-Lingual Transfer. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 198–207, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Enhancing Urdu Sentiment Classification through Instruction-Tuned LLMs and Cross-Lingual Transfer (Khan et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.28.pdf