HCMUS_PrisonDilemma at AbjadAuthorID Shared Task: Less is More with Base Models

Trung Kiet Huynh, Duy Minh Dao Sy, Nguyen Chi Tran, Pham Phu Hoa, Nguyen Lam Phu Quy, Truong Bao Tran


Abstract
We present our approach to the AbjadNLP 2026 Arabic Authorship Identification shared task, achieving 4th place. Our key finding is that AraBERT-base (110M) outperforms AraBERT-large (340M) on the test set with macro F1 of 0.8449 versus 0.8096, despite lower validation scores. We handle long passages via sliding window chunking with mean pooling, and use a two-stage classification head with dual dropout for regularization. Per-class analysis reveals that translated works achieve perfect F1 while classical poets remain challenging due to shared formal structures. Our results challenge the "scale is all you need" assumption for stylometric tasks.
Anthology ID:
2026.abjadnlp-1.54
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
448–452
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.54/
DOI:
Bibkey:
Cite (ACL):
Trung Kiet Huynh, Duy Minh Dao Sy, Nguyen Chi Tran, Pham Phu Hoa, Nguyen Lam Phu Quy, and Truong Bao Tran. 2026. HCMUS_PrisonDilemma at AbjadAuthorID Shared Task: Less is More with Base Models. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 448–452, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
HCMUS_PrisonDilemma at AbjadAuthorID Shared Task: Less is More with Base Models (Huynh et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.54.pdf