ÒWE-Voice: An Evaluation of Monolingual and Multilingual ASR Model Using Yoruba Proverb Speech Dataset

Daud Abolade

ÒWE-Voice: An Evaluation of Monolingual and Multilingual ASR Model Using Yoruba Proverb Speech Dataset

Abstract

Given the advancement of various Artificial Intelligence (AI) technologies in the 21st century, Automatic Speech Recognition (ASR) plays a vital role in human and machine interaction and serves as an interface for a wide range of applications. The development of these high-performing, robust and useful technologies continue to gain more attention on high-resource languages due to high availability of language data, market profitability dominance and access to funding and research initiatives compared to the marginalised low-resource languages. Despite efforts to develop ASR systems for African languages, there are still numerous challenges due to limited speech datasets, tonal complexity and dialectal variation. In this study, we curated a domain-specific speech dataset for one of the oral Yoruba literatures, proverbs, which are highly culturally inclined. We used the Yoruba recording app that was developed for Iroyin-speech project to record 6 hours of Yoruba proverb sentences. The NCAIR1/Yoruba-ASR model which was finetuned on Open AI Whisper Small and Massively Multilingual Speech, a multilingual speech model featuring low-resource languages including Yoruba language was evaluated with the recorded Yoruba proverbs. Evaluation was conducted based on Word Error Rate (WER) and Tone Error Rate (TER). Our result shows that current ASR systems that support Yoruba does not capture cultural nuances. These findings highlight an urgent need to curate more robust speech datasets that are culturally embedded for low resource languages and in this case particularly, Yoruba language in order to build technological tools that preserve African culture, language and identity.

Anthology ID:: 2026.africanlp-main.8
Volume:: Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Everlyn Asiko Chimoto, Constantine Lignos, Shamsuddeen Muhammad, Idris Abdulmumin, Clemencia Siro, David Ifeoluwa Adelani
Venues:: AfricaNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 82–87
Language:
URL:: https://aclanthology.org/2026.africanlp-main.8/
DOI:
Bibkey:
Cite (ACL):: Daud Abolade. 2026. ÒWE-Voice: An Evaluation of Monolingual and Multilingual ASR Model Using Yoruba Proverb Speech Dataset. In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), pages 82–87, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: ÒWE-Voice: An Evaluation of Monolingual and Multilingual ASR Model Using Yoruba Proverb Speech Dataset (Abolade, AfricaNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.africanlp-main.8.pdf

PDF Cite Search Fix data