Tone in Yoruba ASR: Evaluating the Impact of Tone Recognition on Transformer-Based ASR Models

Joy Olusanya


Abstract
This research investigates the role of tone in Standard Yoruba Automatic Speech Recognition (ASR), focusing on how explicit tone marking (diacritics) influences accuracy and overall system performance. As a low-resource tonal language, Yoruba encodes critical lexical and grammatical contrasts via pitch, making tone handling both essential and challenging for ASR. Three pre-trained models, Meta’s MMS-1B-all, OpenAI’s Whisper-small, and AstralZander/Yoruba_ASR, were trained and evaluated on datasets that vary by tone annotation (fully tone-marked vs. non-tone-marked). Using Word Error Rate (WER) and Tone Error Rate (TER) as primary metrics, results consistently favored non-tone-marked data, yielding substantially lower error rates than their tone-marked counterparts. These outcomes suggest that current architectures encounter difficulties with diacritically marked Yoruba, likely stemming from tokenization behavior, insufficient representation of tonal cues, and limited tone modeling in the underlying pre-training. The study concludes that tone-aware approaches, spanning tokenization, acoustic-text alignment, and model objectives, are necessary to improve recognition for Yoruba and other low-resource tonal languages. The findings clarify the interaction between linguistic tone systems and computational modeling, and offer concrete directions for building more robust, tone-sensitive ASR systems.
Anthology ID:
2026.loreslm-1.14
Volume:
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venue:
LoResLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
149–156
Language:
URL:
https://aclanthology.org/2026.loreslm-1.14/
DOI:
Bibkey:
Cite (ACL):
Joy Olusanya. 2026. Tone in Yoruba ASR: Evaluating the Impact of Tone Recognition on Transformer-Based ASR Models. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 149–156, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Tone in Yoruba ASR: Evaluating the Impact of Tone Recognition on Transformer-Based ASR Models (Olusanya, LoResLM 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.loreslm-1.14.pdf