Hadia Mohmmedosman Ahmed Samil
2026
Sudanese-Flores: Extending FLORES+ to Sudanese Arabic Dialect
Hadia Mohmmedosman Ahmed Samil | David Ifeoluwa Adelani
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Hadia Mohmmedosman Ahmed Samil | David Ifeoluwa Adelani
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
In this work, we introduce Sudanese-Flores, an extension of the popular Flores+ machine translation (MT) benchmark to the Sudanese Arabic dialect. We translate both the DEV and DEVTEST splits of the Modern Standard Arabic dataset into the corresponding Sudanese dialect, resulting in a total of 2,009 sentences. While the dialect was recently introduced in Google Translate, there are no available benchmark in this dialect despite spoken by over 40 million people. Our evaluation on two leading LLMs such as GPT-4.1 and Gemini 2.5 Flash showed that while the performance English to Arabic is impressive (more than 23 BLEU), they struggle on Sudanese dialect (less than 11 BLEU) in zero-shot settings. In few-shot scenario, we achieved only a slight boost in performance.