Cheetah: Natural Language Generation for 517 African Languages

Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed


Abstract
Low-resource African languages pose unique challenges for natural language processing (NLP) tasks, including natural language generation (NLG). In this paper, we develop Cheetah, a massively multilingual NLG language model for African languages. Cheetah supports 517 African languages and language varieties, allowing us to address the scarcity of NLG resources and provide a solution to foster linguistic diversity. We demonstrate the effectiveness of Cheetah through comprehensive evaluations across six generation downstream tasks. In five of the six tasks, Cheetah significantly outperforms other models, showcasing its remarkable performance for generating coherent and contextually appropriate text in a wide range of African languages. We additionally conduct a detailed human evaluation to delve deeper into the linguistic capabilities of Cheetah. The findings of this study contribute to advancing NLP research in low-resource settings, enabling greater accessibility and inclusion for African languages in a rapidly expanding digital landscape. We will publicly release our models for research.
Anthology ID:
2024.acl-long.691
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12798–12823
Language:
URL:
https://aclanthology.org/2024.acl-long.691
DOI:
10.18653/v1/2024.acl-long.691
Bibkey:
Cite (ACL):
Ife Adebara, AbdelRahim Elmadany, and Muhammad Abdul-Mageed. 2024. Cheetah: Natural Language Generation for 517 African Languages. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12798–12823, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Cheetah: Natural Language Generation for 517 African Languages (Adebara et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-long.691.pdf