From Classical to Contemporary: Evolutionary Analysis & Classification of Urdu Poetry

Noor Fatima; Hasan Faraz Khan; Irfan Ahmad

From Classical to Contemporary: Evolutionary Analysis & Classification of Urdu Poetry

Noor Fatima, Hasan Faraz Khan, Irfan Ahmad

Abstract

Automatic classification of literary text by historical era can support literary analysis and reveal stylistic evolution. We study this problem for Urdu poetry across three eras, classical, modern, and contemporary. We introduce a new dataset of 10,026 four-line Urdu poetry segments collected from online archives (Rekhta and UrduPoint) and labeled by era. To handle Urdu’s script and orthographic variability, we apply standard preprocessing, including Unicode normalization and removal of diacritics and non-Urdu characters. We benchmark a range of approaches, from traditional machine learning classifiers to deep learning models, including fine-tuned Urdu BERT-style transformers. To assess generalization, we evaluate under two regimes: (i) a standard stratified random split and (ii) a stricter author-disjoint split that ensures poets do not overlap between training and test sets. On the random split, the best traditional models achieve about 70-73% accuracy, suggesting era-related stylistic cues are learnable. However, performance drops to roughly 58-60% under the author-disjoint split, highlighting the difficulty in generalizing across unseen poets and the possibility of overestimating performance via author-specific leakage. Notably, fine-tuned transformers do not surpass simpler TF-IDF-based baselines, indicating that era cues may be subtle and that data limitations constrain more complex models.

Anthology ID:: 2026.abjadnlp-1.26
Volume:: Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Venues:: AbjadNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 182–191
Language:
URL:: https://aclanthology.org/2026.abjadnlp-1.26/
DOI:
Bibkey:
Cite (ACL):: Noor Fatima, Hasan Faraz Khan, and Irfan Ahmad. 2026. From Classical to Contemporary: Evolutionary Analysis & Classification of Urdu Poetry. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 182–191, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: From Classical to Contemporary: Evolutionary Analysis & Classification of Urdu Poetry (Fatima et al., AbjadNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.abjadnlp-1.26.pdf

PDF Cite Search Fix data