Chiamaka Ijeoma Chukwuneke
2025
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Abinew Ali Ayele | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Saminu Mohammad Aliyu | Paul Röttger | Abigail Oppong | Andiswa Bukula | Chiamaka Ijeoma Chukwuneke | Ebrahim Chekol Jibril | Elyas Abdi Ismail | Esubalew Alemneh | Hagos Tesfahun Gebremichael | Lukman Jibril Aliyu | Meriem Beloucif | Oumaima Hourrane | Rooweither Mabuya | Salomey Osei | Samuel Rutunda | Tadesse Destaw Belay | Tadesse Kebede Guge | Tesfa Tegegne Asfaw | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Seid Muhie Yimam | Nedjma Ousidhoum
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Shamsuddeen Hassan Muhammad | Idris Abdulmumin | Abinew Ali Ayele | David Ifeoluwa Adelani | Ibrahim Said Ahmad | Saminu Mohammad Aliyu | Paul Röttger | Abigail Oppong | Andiswa Bukula | Chiamaka Ijeoma Chukwuneke | Ebrahim Chekol Jibril | Elyas Abdi Ismail | Esubalew Alemneh | Hagos Tesfahun Gebremichael | Lukman Jibril Aliyu | Meriem Beloucif | Oumaima Hourrane | Rooweither Mabuya | Salomey Osei | Samuel Rutunda | Tadesse Destaw Belay | Tadesse Kebede Guge | Tesfa Tegegne Asfaw | Lilian Diana Awuor Wanzare | Nelson Odhiambo Onyango | Seid Muhie Yimam | Nedjma Ousidhoum
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked.These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is a tweet annotated by native speakers familiar with the regional culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. We find that model performance highly depends on the language and that multilingual models can help boost performance in low-resource settings.
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
Shamsuddeen Hassan Muhammad | Nedjma Ousidhoum | Idris Abdulmumin | Jan Philip Wahle | Terry Ruas | Meriem Beloucif | Christine de Kock | Nirmal Surange | Daniela Teodorescu | Ibrahim Said Ahmad | David Ifeoluwa Adelani | Alham Fikri Aji | Felermino D. M. A. Ali | Ilseyar Alimova | Vladimir Araujo | Nikolay Babakov | Naomi Baes | Ana-Maria Bucur | Andiswa Bukula | Guanqun Cao | Rodrigo Tufiño | Rendi Chevi | Chiamaka Ijeoma Chukwuneke | Alexandra Ciobotaru | Daryna Dementieva | Murja Sani Gadanya | Robert Geislinger | Bela Gipp | Oumaima Hourrane | Oana Ignat | Falalu Ibrahim Lawan | Rooweither Mabuya | Rahmad Mahendra | Vukosi Marivate | Alexander Panchenko | Andrew Piper | Charles Henrique Porto Ferreira | Vitaly Protasov | Samuel Rutunda | Manish Shrivastava | Aura Cristina Udrea | Lilian Diana Awuor Wanzare | Sophie Wu | Florian Valentin Wunderlich | Hanif Muhammad Zhafran | Tianhui Zhang | Yi Zhou | Saif M. Mohammad
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shamsuddeen Hassan Muhammad | Nedjma Ousidhoum | Idris Abdulmumin | Jan Philip Wahle | Terry Ruas | Meriem Beloucif | Christine de Kock | Nirmal Surange | Daniela Teodorescu | Ibrahim Said Ahmad | David Ifeoluwa Adelani | Alham Fikri Aji | Felermino D. M. A. Ali | Ilseyar Alimova | Vladimir Araujo | Nikolay Babakov | Naomi Baes | Ana-Maria Bucur | Andiswa Bukula | Guanqun Cao | Rodrigo Tufiño | Rendi Chevi | Chiamaka Ijeoma Chukwuneke | Alexandra Ciobotaru | Daryna Dementieva | Murja Sani Gadanya | Robert Geislinger | Bela Gipp | Oumaima Hourrane | Oana Ignat | Falalu Ibrahim Lawan | Rooweither Mabuya | Rahmad Mahendra | Vukosi Marivate | Alexander Panchenko | Andrew Piper | Charles Henrique Porto Ferreira | Vitaly Protasov | Samuel Rutunda | Manish Shrivastava | Aura Cristina Udrea | Lilian Diana Awuor Wanzare | Sophie Wu | Florian Valentin Wunderlich | Hanif Muhammad Zhafran | Tianhui Zhang | Yi Zhou | Saif M. Mohammad
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
People worldwide use language in subtle and complex ways to express emotions. Although emotion recognition–an umbrella term for several NLP tasks–impacts various applications within NLP and beyond, most work in this area has focused on high-resource languages. This has led to significant disparities in research efforts and proposed solutions, particularly for under-resourced languages, which often lack high-quality annotated datasets.In this paper, we present BRIGHTER–a collection of multi-labeled, emotion-annotated datasets in 28 different languages and across several domains. BRIGHTER primarily covers low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances labeled by fluent speakers. We highlight the challenges related to the data collection and annotation processes, and then report experimental results for monolingual and crosslingual multi-label emotion identification, as well as emotion intensity recognition. We analyse the variability in performance across languages and text domains, both with and without the use of LLMs, and show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
David Ifeoluwa Adelani | Jessica Ojo | Israel Abebe Azime | Jian Yun Zhuang | Jesujoba Oluwadara Alabi | Xuanli He | Millicent Ochieng | Sara Hooker | Andiswa Bukula | En-Shiun Annie Lee | Chiamaka Ijeoma Chukwuneke | Happy Buzaaba | Blessing Kudzaishe Sibanda | Godson Koffi Kalipe | Jonathan Mukiibi | Salomon Kabongo Kabenamualu | Foutse Yuehgoh | Mmasibidi Setaka | Lolwethu Ndolela | Nkiruka Odu | Rooweither Mabuya | Salomey Osei | Shamsuddeen Hassan Muhammad | Sokhar Samb | Tadesse Kebede Guge | Tombekai Vangoni Sherman | Pontus Stenetorp
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
David Ifeoluwa Adelani | Jessica Ojo | Israel Abebe Azime | Jian Yun Zhuang | Jesujoba Oluwadara Alabi | Xuanli He | Millicent Ochieng | Sara Hooker | Andiswa Bukula | En-Shiun Annie Lee | Chiamaka Ijeoma Chukwuneke | Happy Buzaaba | Blessing Kudzaishe Sibanda | Godson Koffi Kalipe | Jonathan Mukiibi | Salomon Kabongo Kabenamualu | Foutse Yuehgoh | Mmasibidi Setaka | Lolwethu Ndolela | Nkiruka Odu | Rooweither Mabuya | Salomey Osei | Shamsuddeen Hassan Muhammad | Sokhar Samb | Tadesse Kebede Guge | Tombekai Vangoni Sherman | Pontus Stenetorp
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench—a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference(AfriXNLI), mathematical reasoning(AfriMGSM), and multi-choice knowledge-based QA(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages (such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Gemma 2 27B only at 63% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like Gemma 2 27B and LLaMa 3.1 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.
Search
Fix author
Co-authors
- David Ifeoluwa Adelani 3
- Andiswa Bukula 3
- Rooweither Mabuya 3
- Shamsuddeen Hassan Muhammad 3
- Idris Abdulmumin 2
- Ibrahim Said Ahmad 2
- Meriem Beloucif 2
- Tadesse Kebede Guge 2
- Oumaima Hourrane 2
- Salomey Osei 2
- Nedjma Ousidhoum 2
- Samuel Rutunda 2
- Lilian Diana Awuor Wanzare 2
- Alham Fikri Aji 1
- Jesujoba Alabi 1
- Esubalew Alemneh 1
- Felermino D. M. A. Ali 1
- Ilseyar Alimova 1
- Saminu Mohammad Aliyu 1
- Lukman Jibril Aliyu 1
- Vladimir Araujo 1
- Tesfa Tegegne Asfaw 1
- Abinew Ali Ayele 1
- Israel Abebe Azime 1
- Nikolay Babakov 1
- Naomi Baes 1
- Tadesse Destaw Belay 1
- Ana-Maria Bucur 1
- Happy Buzaaba 1
- Guanqun Cao 1
- Rendi Chevi 1
- Alexandra Ciobotaru 1
- Daryna Dementieva 1
- Charles Henrique Porto Ferreira 1
- Murja Sani Gadanya 1
- Hagos Tesfahun Gebremichael 1
- Robert Geislinger 1
- Bela Gipp 1
- Xuanli He 1
- Sara Hooker 1
- Oana Ignat 1
- Elyas Abdi Ismail 1
- Ebrahim Chekol Jibril 1
- Salomon Kabongo Kabenamualu 1
- Godson Koffi Kalipe 1
- Falalu Ibrahim Lawan 1
- En-Shiun Annie Lee 1
- Rahmad Mahendra 1
- Vukosi Marivate 1
- Saif Mohammad 1
- Jonathan Mukiibi 1
- Lolwethu Ndolela 1
- Millicent Ochieng 1
- Nkiruka Odu 1
- Jessica Ojo 1
- Nelson Odhiambo Onyango 1
- Abigail Oppong 1
- Alexander Panchenko 1
- Andrew Piper 1
- Vitaly Protasov 1
- Terry Ruas 1
- Paul Röttger 1
- Sokhar Samb 1
- Mmasibidi Setaka 1
- Tombekai Vangoni Sherman 1
- Manish Shrivastava 1
- Blessing Kudzaishe Sibanda 1
- Pontus Stenetorp 1
- Nirmal Surange 1
- Daniela Teodorescu 1
- Rodrigo Tufiño 1
- Aura Cristina Udrea 1
- Jan Philip Wahle 1
- Sophie Wu 1
- Florian Valentin Wunderlich 1
- Seid Muhie Yimam 1
- Foutse Yuehgoh 1
- Hanif Muhammad Zhafran 1
- Tianhui Zhang 1
- Yi Zhou 1
- Jian Yun Zhuang 1
- Christine de Kock 1