SALT-31: A Machine Translation Benchmark Dataset for 31 Ugandan Languages

Solomon Nsumba; Benjamin Akera; Evelyn Nafula Ouma; Medadi E. Ssentanda; Deo Kawalya; Engineer Bainomugisha; Ernest Tonny Mwebaze; John Quinn

SALT-31: A Machine Translation Benchmark Dataset for 31 Ugandan Languages

Solomon Nsumba, Benjamin Akera, Evelyn Nafula Ouma, Medadi E. Ssentanda, Deo Kawalya, Engineer Bainomugisha, Ernest Tonny Mwebaze, John Quinn

Abstract

We present the SALT-31 benchmark dataset for evaluation of machine translation models covering 31 Ugandan languages. Unlike sentence-level evaluation sets, SALT-31 is constructed from short, scenario-driven mini-dialogues designed to preserve discourse context, pragmatics, and culturally grounded communication patterns common in everyday Ugandan settings. The dataset contains 100 English sentences organized into 20 typical communication scenarios, each represented as a five-sentence mini-sequence. It can therefore be used to evaluate both sentence-level and paragraph level machine translation, and includes nearly every language spoken in a country with high linguistic diversity. It is available at https://huggingface.co/datasets/Sunbird/salt-31

Anthology ID:: 2026.africanlp-main.21
Volume:: Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
Venues:: AfricaNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 211–216
Language:
URL:: https://aclanthology.org/2026.africanlp-main.21/
DOI:
Bibkey:
Cite (ACL):: Solomon Nsumba, Benjamin Akera, Evelyn Nafula Ouma, Medadi E. Ssentanda, Deo Kawalya, Engineer Bainomugisha, Ernest Tonny Mwebaze, and John Quinn. 2026. SALT-31: A Machine Translation Benchmark Dataset for 31 Ugandan Languages. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 211–216, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: SALT-31: A Machine Translation Benchmark Dataset for 31 Ugandan Languages (Nsumba et al., AfricaNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.africanlp-main.21.pdf

PDF Cite Search Fix data