AbjadMed: Arabic Medical Text Classification at AbjadNLP 2026

Pranav Gupta, Niranjan Kumar M, Balaji Nagarajan, Imed Zitouni, Mo El-Haj


Abstract
We present AbjadMed, a shared task on Arabic medical text classification organised as part of the 2nd AbjadNLP workshop at EACL 2026. The task targets supervised multi-class classification under realistic conditions of severe class imbalance, fine-grained category structure, and naturally occurring label noise. Participants assign each Arabic medical question–answer instance to one of 82 predefined categories derived from real healthcare consultations. The dataset is based on the Arabic Healthcare Dataset (AHD) and is released as curated training and test splits containing 27,951 and 18,634 instances respectively, while preserving the original label distribution. Systems are evaluated using macro-averaged F1 to emphasise performance on minority medical topics. Results show that Arabic medical text classification remains challenging even with modern pretrained models, particularly for low-frequency and semantically overlapping categories. AbjadMed provides a reproducible benchmark for studying robustness and generalisation in Arabic healthcare NLP.
Anthology ID:
2026.abjadnlp-1.64
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
506–514
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.64/
DOI:
Bibkey:
Cite (ACL):
Pranav Gupta, Niranjan Kumar M, Balaji Nagarajan, Imed Zitouni, and Mo El-Haj. 2026. AbjadMed: Arabic Medical Text Classification at AbjadNLP 2026. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 506–514, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
AbjadMed: Arabic Medical Text Classification at AbjadNLP 2026 (Gupta et al., AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.64.pdf