MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic

Rana Gaber, Yara Allam, Serag Amin, Ranwa Aly, Bashar Alhafni


Abstract
This paper presents our contribution to the closed data track of the AMIYA Shared Task on Dialectal Arabic text generation. In this track, we train fully open-source Large Language Models (LLMs) on five Arabic dialects: Egyptian, Moroccan, Palestinian, Saudi, and Syrian, using the provided training datasets. We experiment with different base and instruct models using several pretraining and instruction tuning approaches. In total, five models were submitted, with three variants per dialect. Our best-performing models for the five dialects are ALLaM for Egyptian, LLaMa for Moroccan, and Palestinian, and Aya for Saudi and Syrian.
Anthology ID:
2026.vardial-1.31
Volume:
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
March
Year:
2026
Address:
Rabat, Morocco
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
373–384
Language:
URL:
https://aclanthology.org/2026.vardial-1.31/
DOI:
Bibkey:
Cite (ACL):
Rana Gaber, Yara Allam, Serag Amin, Ranwa Aly, and Bashar Alhafni. 2026. MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic. In Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 373–384, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic (Gaber et al., VarDial 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.vardial-1.31.pdf