Real-Time Spoken Instruction Following and Translation in Ugandan Languages

Benjamin Akera; Tim Wenjie Hu; Patrick Walukagga; Evelyn Nafula Ouma; Yiga Gilbert; Ernest Tonny Mwebaze; John Quinn

Real-Time Spoken Instruction Following and Translation in Ugandan Languages

Benjamin Akera, Tim Wenjie Hu, Patrick Walukagga, Evelyn Nafula Ouma, Yiga Gilbert, Ernest Tonny Mwebaze, John Quinn

Abstract

Many languages are predominantly spoken rather than written, and to bring the benefits of LLMs to speakers of these languages, it is essential that models cater to the voice modality. The typical approach is to cascade ASR, LLM and TTS models together, though this results in systems with high latency, making them unsuitable for natural, real-time interaction. We describe results on taking the encoder part of a Whisper-based model trained to recognise ten languages common in Uganda, and using the Ultravox architecture to project its output directly to the input embedding space of a text model based on Qwen 3 32B, also trained to have comprehension of those languages. The result is a speech LLM with high accuracy and very low latency. For most spoken prompts, we can begin streaming a text response within as low as 50 ms, and a speech audio response within around one second, making real-time spoken interaction with an LLM possible for the first time in these languages. The model is available open source onHugging Face.

Anthology ID:: 2026.africanlp-main.20
Volume:: Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
Venues:: AfricaNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 204–210
Language:
URL:: https://aclanthology.org/2026.africanlp-main.20/
DOI:
Bibkey:
Cite (ACL):: Benjamin Akera, Tim Wenjie Hu, Patrick Walukagga, Evelyn Nafula Ouma, Yiga Gilbert, Ernest Tonny Mwebaze, and John Quinn. 2026. Real-Time Spoken Instruction Following and Translation in Ugandan Languages. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 204–210, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Real-Time Spoken Instruction Following and Translation in Ugandan Languages (Akera et al., AfricaNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.africanlp-main.20.pdf

PDF Cite Search Fix data