Joyce Nakatumba-Nabende
2022
Gender bias Evaluation in Luganda-English Machine Translation
Eric Peter Wairagala | Jonathan Mukiibi | Jeremy Francis Tusubira | Claire Babirye | Joyce Nakatumba-Nabende | Andrew Katumba | Ivan Ssenkungu
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Eric Peter Wairagala | Jonathan Mukiibi | Jeremy Francis Tusubira | Claire Babirye | Joyce Nakatumba-Nabende | Andrew Katumba | Ivan Ssenkungu
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
We have seen significant growth in the area of building Natural Language Processing (NLP) tools for African languages. However, the evaluation of gender bias in the machine translation systems for African languages is not yet thoroughly investigated. This is due to the unavailability of explicit text data available for addressing the issue of gender bias in machine translation. In this paper, we use transfer learning techniques based on a pre-trained Marian MT model for building machine translation models for English-Luganda and Luganda-English. Our work attempts to evaluate and quantify the gender bias within a Luganda-English machine translation system using Word Embeddings Fairness Evaluation Framework (WEFE). Luganda is one of the languages with gender-neutral pronouns in the world, therefore we use a small set of trusted gendered examples as the test set to evaluate gender bias by biasing word embeddings. This approach allows us to focus on Luganda-Engish translations with gender-specific pronouns, and the results of the gender bias evaluation are confirmed by human evaluation. To compare and contrast the results of the word embeddings evaluation metric, we used a modified version of the existing Translation Gender Bias Index (TGBI) based on the grammatical consideration for Luganda.
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumin | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.
The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Jonathan Mukiibi | Andrew Katumba | Joyce Nakatumba-Nabende | Ali Hussein | Joshua Meyer
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Jonathan Mukiibi | Andrew Katumba | Joyce Nakatumba-Nabende | Ali Hussein | Joshua Meyer
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Building a usable radio monitoring automatic speech recognition (ASR) system is a challenging task for under-resourced languages and yet this is paramount in societies where radio is the main medium of public communication and discussions. Initial efforts by the United Nations in Uganda have proved how understanding the perceptions of rural people who are excluded from social media is important in national planning. However, these efforts are being challenged by the absence of transcribed speech datasets. In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours. To our knowledge, this is the first publicly available radio dataset in sub-Saharan Africa. The paper describes the development of the voice corpus and presents baseline Luganda ASR performance results using Coqui STT toolkit, an open-source speech recognition toolkit.
2021
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
Search
Fix author
Co-authors
- Jonathan Mukiibi 4
- David Ifeoluwa Adelani 2
- Tosin Adewumi 2
- Mofetoluwa Adeyemi 2
- Orevaoghene Ahia 2
- Jesujoba Alabi 2
- Happy Buzaaba 2
- Chiamaka Chukwuneke 2
- Bonaventure F. P. Dossou 2
- Chris Chinenye Emezue 2
- Ignatius Ezeani 2
- Catherine Gitau 2
- Andrew Katumba 2
- Constantine Lignos 2
- Derguene Mbaye 2
- Shamsuddeen Hassan Muhammad 2
- Graham Neubig 2
- Perez Ogayo 2
- Chester Palen-Michel 2
- Shruti Rijhwani 2
- Sebastian Ruder 2
- Blessing Kudzaishe Sibanda 2
- Eric Peter Wairagala 2
- Jade Abbott 1
- Idris Abdulmumin 1
- Adewale Akinfaderin 1
- Victor Akinode 1
- Emmanuel Anebi 1
- Aremu Anuoluwapo 1
- Anuoluwapo Aremu 1
- Ayodele Awokoya 1
- Israel Abebe Azime 1
- Claire Babirye 1
- Tobius Saul Bateesa 1
- Michael Beukman 1
- Andiswa Bukula 1
- Thierno Ibrahima DIOP 1
- Davis David 1
- Abdoulaye Diallo 1
- Cheikh M. Bamba Dione 1
- Daniel D’souza 1
- Abdoulaye Faye 1
- Dibora Gebreyohannes 1
- Tajuddeen Rabiu Gwadabe 1
- Tajuddeen Gwadabe 1
- Gilles Q. Hacheme 1
- Ali Hussein 1
- Fatoumata Kabore 1
- Godson Kalipe 1
- Maurice Katusiime 1
- Dietrich Klakow 1
- Julia Kreutzer 1
- Mouhamadane MBOUP 1
- Rooweither Mabuya 1
- Tebogo Macucwa 1
- Tendai Marengereke 1
- Vukosi Marivate 1
- Stephen Mayhew 1
- Elvis Mboning 1
- Victoire Memdjokam Koagne 1
- Joshua Meyer 1
- Neo L. Mokono 1
- Tatiana Moteu Ngoli 1
- Edwin Munkoh-Buabeng 1
- Gerald Muriuki 1
- Deborah Nabagereka 1
- Peter Nabende 1
- Samba Ngom 1
- Rubungo Andre Niyongabo 1
- Kelechi Nwaike 1
- Nkiruka Odu 1
- Kelechi Ogueji 1
- Odunayo Ogundepo 1
- Temilola Oloyede 1
- Iroro Orife 1
- Salomey Osei 1
- Verrah Otiende 1
- Samuel Oyerinde 1
- Paul Rayson 1
- Clemencia Siro 1
- Ivan Ssenkungu 1
- Allahsera Auguste Tapo 1
- Amelia Taylor 1
- Henok Tilaye 1
- Jeremy Francis Tusubira 1
- Yvonne Wambui 1
- Degaga Wolde 1
- Seid Muhie Yimam 1
- Oreen Yousuf 1