Elisha Ondieki Makori
2026
AfriMMT-EA: Multi-domain Machine Translation for Low-Resource East African Languages
Naome A Etori | Kelechi Ezema | Nathaniel Romney Robinson | Davis David | Alfred Malengo Kondoro | Elisha Ondieki Makori | Michael Samwel Mollel | Maria Gini
Findings of the Association for Computational Linguistics: EACL 2026
Naome A Etori | Kelechi Ezema | Nathaniel Romney Robinson | Davis David | Alfred Malengo Kondoro | Elisha Ondieki Makori | Michael Samwel Mollel | Maria Gini
Findings of the Association for Computational Linguistics: EACL 2026
Despite remarkable progress in multilingual machine translation (MT), the majority of African—especially East African—languages remain significantly underrepresented both in benchmark datasets and state-of-the-art (SOTA) MT models. This persistent exclusion from mainstream technologies not only limits equitable access, but constrains the development of tools that accurately reflect the region’s linguistic and cultural diversity. Recent advances in open-source large language models have demonstrated strong multilingual MT capabilities through data-efficient adaptation strategies. However, little work has explored their potential for low-resource African languages. We introduce AfriMMT-EA, the first highly multilingual benchmark and MT dataset for East African languages. Our datasets comprise 54 local languages across five East African countries. We used these data to fine-tune two multilingual versions of Gemma-3. We compare models’ performance on these languages with larger off-the-shelf baselines. We release our data and models, in the interest of advancing MT for these low-resource languages and their communities.