Eyaa-Tom 26, Yodi-Mantissa and Lom Bench: A Community Benchmark for TTS in Local Languages

Bakoubolo Essowe Justin, Catherine Nana Nyaah Essuman, Messan Agbobli, Ahoefa Kansiwer, Eli Jean Doumeyan, Julie Pato, Notou Your Timibe, Emile KOGBEDJI Agossou, Guedela Bakouya


Abstract
We present an extension of our previous work on multilingual NLP for Togolese languages by introducing new datasets, improved models, and a community-driven evaluation benchmark for Text-To-Speech (TTS). We expand the Eyaa-Tom multilingual corpus with additional speech data of about 26.9k recordings (30.9 hours) across 10 local languages, and incorporated 64.6k clips (46.6 hours) of Mozilla Common Voice contributions for Adja, Nawdm, Mina, and Tem to strengthen Automatic Speech Recognition (ASR) and speech synthesis. We detail how community contributors – including collaboration with a national TV journalist – helped collect and validate the Kabyè and French text, with an ethical compensation model in place. We fine-tune state-of-the-art models: OpenAI Whisper and faster-whisper, and Meta’s NLLB-200 model for machine translation across 11 languages (achieving 19.4 BLEU score for French→Ewe and 26.1 BLEU score for Kabyè→French). We also introduce the Lom Bench, a community-based benchmark where native speakers rate TTS output, indicating promising preliminary results in Mina and Togolese lingua franca french although further data is needed. We provide a comparative analysis of our results with recent multilingual systems, including Simba, Meta’s Omnilingual ASR, and UBC Toucan. Our work emphasizes practical pathways and how FAIR data sourcing and community participation can drive sustainable NLP development for underserved languages.
Anthology ID:
2026.africanlp-main.28
Volume:
Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
Venues:
AfricaNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
264–270
Language:
URL:
https://aclanthology.org/2026.africanlp-main.28/
DOI:
Bibkey:
Cite (ACL):
Bakoubolo Essowe Justin, Catherine Nana Nyaah Essuman, Messan Agbobli, Ahoefa Kansiwer, Eli Jean Doumeyan, Julie Pato, Notou Your Timibe, Emile KOGBEDJI Agossou, and Guedela Bakouya. 2026. Eyaa-Tom 26, Yodi-Mantissa and Lom Bench: A Community Benchmark for TTS in Local Languages. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 264–270, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Eyaa-Tom 26, Yodi-Mantissa and Lom Bench: A Community Benchmark for TTS in Local Languages (Justin et al., AfricaNLP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.africanlp-main.28.pdf