Hoa Pham Phu

2026

HCMUS_PrompterXPrompter at AbjadMed: When Classification Meets Retrieval: Taming the Long Tail in Arabic Medical Text Classification
Duy Minh Dao Sy | Trung Kiet Huynh | Nguyen Dinh Ha Duong | Nguyen Chi Tran | Phu Quy Nguyen Lam | Hoa Pham Phu
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

Medical text classification is high-stakes work, yet models often falter precisely where they are needed most: on rare, critical conditions buried in the long tail of the data distribution. In the EACL 2026 ABJAD-NLP Shared Task, we confronted this challenge with a dataset of Arabic medical questions heavily skewed towards a few common topics, leaving dozens of categories with fewer than ten examples. We present HybridMed, a system that effectively tames this long tail by marrying the semantic generalization of a fine-tuned Arabic BERT model with the precise, instance-based memory of k-nearest neighbor retrieval. This complementary union allowed our system to achieve a macro-F1 score of 0.4902, demonstrating that for diverse and imbalanced medical data, the whole is indeed greater than the sum of its parts.

Co-authors

Venues

AbjadNLP1
WS1

Fix author