Morphological Feature Extraction for Fine-Grained Sorani Kurdish Dialect Identification: A Hybrid Transformer-Linguistic Approach

Soumedhik Bharati, Shibam Mandal, Subham Majumdar, Swarup Kr Ghosh, Sayani Mondal


Abstract
As reported, approximately 6 million people in Iraq and Iran speak in Sorani Kurdish, which exhibits substantial regional variation but lacks computational resources for dialect identification. We present the first fine-grained sub-dialect classification system for six Sorani varieties namely, Sulaymaniyah, Erbil, Iranian Sorani, Ardalani, Babani, and Mukriani. This investigation combines cross-lingual contextual embeddings (XLM-RoBERTa) with morphological features derived from explicit linguistic rules, including 24 patterns capturing verb prefixes, pronominal clitics, and definite markers. The suggested morphology-augmented XLM-R model has been trained on a unified dataset of 16,409 sentences without manual annotation, and achieves 91.91% accuracy, outperforming pure transformers (91.79%) and traditional machine learning baselines (SVM 86.41%). Key ablation studies reveal that morphological features serve as effective regularizers for geographically proximate dialects.
Anthology ID:
2026.abjadnlp-1.24
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
172–176
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.24/
DOI:
Bibkey:
Cite (ACL):
Soumedhik Bharati, Shibam Mandal, Subham Majumdar, Swarup Kr Ghosh, and Sayani Mondal. 2026. Morphological Feature Extraction for Fine-Grained Sorani Kurdish Dialect Identification: A Hybrid Transformer-Linguistic Approach. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 172–176, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Morphological Feature Extraction for Fine-Grained Sorani Kurdish Dialect Identification: A Hybrid Transformer-Linguistic Approach (Bharati et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.24.pdf
Optionalsupplementarymaterial:
 2026.abjadnlp-1.24.OptionalSupplementaryMaterial.rar