MedArabs at AbjadMed: Arabic Medical Text Classification via Data- and Algorithm-Level Fusion

Amrita Singh


Abstract
In this work, we address the challenges of Arabic medical text classification, focusing on class imbalance and the complexity of the language’s morphology. We propose a multiclass classification pipeline based on Data- and Algorithm-Level fusion, which integrates the optimal Back Translation technique for data augmentation with the Class Balanced (CB) loss function to enhance performance. The domain-specific AraBERT model is fine-tuned using this approach, achieving competitive results. On the official test set of the AbjadMed task, our pipeline achieves a Macro-F1 score of 0.4219, and it achieves 0.4068 on the development set.
Anthology ID:
2026.abjadnlp-1.12
Volume:
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
AbjadNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–104
Language:
URL:
https://aclanthology.org/2026.abjadnlp-1.12/
DOI:
Bibkey:
Cite (ACL):
Amrita Singh. 2026. MedArabs at AbjadMed: Arabic Medical Text Classification via Data- and Algorithm-Level Fusion. In Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script, pages 100–104, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
MedArabs at AbjadMed: Arabic Medical Text Classification via Data- and Algorithm-Level Fusion (Singh, AbjadNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.abjadnlp-1.12.pdf