Rebwar M. Nabi


2026

Authorship identification is a fundamental task in natural language processing and computational stylistics. Despite significant advancements in high-resource languages, lowresource languagesparticularly those utilizing non-Latin scriptsremain largely underexplored, leaving a critical gap in resources and benchmarks for this linguistically distinct, lowresource language. Addressing this oversight, this paper presents Task 3 of AbjadNLP 2026, the first shared task dedicated to authorship identification for Kurdish. The task introduces a newly constructed dataset designed to capture the unique phonological and orthographic features of Sorani Kurdish and formulate the task as a closed-set multiclass classification problem. To establish a robust baseline, we fine-tune the pretrained XLM-RoBERTa model to capture authorial, stylistic patterns. Experimental results on the test set demonstrate the efficacy of transformer-based representations for this domain, achieving an accuracy of approximately 75%.