INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu; Jesujoba Alabi; Andiswa Bukula; Jian Yun Zhuang; En-Shiun Annie Lee; Tadesse Kebede Guge; Israel Abebe Azime; Happy Buzaaba; Blessing Kudzaishe Sibanda; Godson Koffi Kalipe; Jonathan Mukiibi; Salomon Kabongo Kabenamualu; Mmasibidi Setaka; Lolwethu Ndolela; Nkiruka Odu; Rooweither Mabuya; Shamsuddeen Hassan Muhammad; Salomey Osei; Sokhar Samb; Dietrich Klakow; David Ifeoluwa Adelani

doi:10.18653/v1/2025.acl-long.464

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu, Jesujoba Oluwadara Alabi, Andiswa Bukula, Jian Yun Zhuang, En-Shiun Annie Lee, Tadesse Kebede Guge, Israel Abebe Azime, Happy Buzaaba, Blessing Kudzaishe Sibanda, Godson Koffi Kalipe, Jonathan Mukiibi, Salomon Kabongo Kabenamualu, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Dietrich Klakow, David Ifeoluwa Adelani

Abstract

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce “INJONGO” - a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark fine-tuning multilingual transformer models and prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls short of fine-tuning baselines. When compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that LLMs performance is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

Anthology ID:: 2025.acl-long.464
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9429–9452
Language:
URL:: https://aclanthology.org/2025.acl-long.464/
DOI:: 10.18653/v1/2025.acl-long.464
Bibkey:
Cite (ACL):: Hao Yu, Jesujoba Oluwadara Alabi, Andiswa Bukula, Jian Yun Zhuang, En-Shiun Annie Lee, Tadesse Kebede Guge, Israel Abebe Azime, Happy Buzaaba, Blessing Kudzaishe Sibanda, Godson Koffi Kalipe, Jonathan Mukiibi, Salomon Kabongo Kabenamualu, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Dietrich Klakow, and David Ifeoluwa Adelani. 2025. INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9429–9452, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages (Yu et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.464.pdf

PDF Cite Search Fix data