Youssef Zaghloul


2026

Arabic authorship attribution presents unique challenges due to the language’s rich derivational morphology, which often fragments word-level frequencies. In this paper, we describe our winning submission to the AbjadAuthorID Shared Task. We propose a hybrid ensemble system that fuses the morphological precision of character n-gram LinearSVCs with the semantic understanding of fine-tuned Transformers (AraBERT and XLM-RoBERTa). Contrary to current trends in NLP, we demonstrate that traditional character n-grams (0.92 F1) significantly outperform deep learning baselines (AraBERT 0.87 F1) for this task, suggesting that authorial signature in Arabic is encoded more densely in morphological patterns than in semantic content. Our final system employs a novel Precision Scalpel post-hoc calibration technique and selective pseudo-labeling to address class imbalance and genre confounds. The system achieved the 1st place ranking with a macro F1-score of 0.932 and accuracy of 0.963 on the test set.
Search
Co-authors
    Venues
    Fix author