Tajuddeen Rabiu Gwadabe
2022
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin | Michael Beukman | Jesujoba O. Alabi | Chris Emezue | Everlyn Asiko | Tosin Adewumi | Shamsuddeen Hassan Muhammad | Mofetoluwa Adeyemi | Oreen Yousuf | Sahib Singh | Tajuddeen Rabiu Gwadabe
Proceedings of the Seventh Conference on Machine Translation (WMT)
Idris Abdulmumin | Michael Beukman | Jesujoba O. Alabi | Chris Emezue | Everlyn Asiko | Tosin Adewumi | Shamsuddeen Hassan Muhammad | Mofetoluwa Adeyemi | Oreen Yousuf | Sahib Singh | Tajuddeen Rabiu Gwadabe
Proceedings of the Seventh Conference on Machine Translation (WMT)
We participated in the WMT 2022 Large-Scale Machine Translation Evaluation for the African Languages Shared Task. This work describes our approach, which is based on filtering the given noisy data using a sentence-pair classifier that was built by fine-tuning a pre-trained language model. To train the classifier, we obtain positive samples (i.e. high-quality parallel sentences) from a gold-standard curated dataset and extract negative samples (i.e. low-quality parallel sentences) from automatically aligned parallel data by choosing sentences with low alignment scores. Our final machine translation model was then trained on filtered data, instead of the entire noisy dataset. We empirically validate our approach by evaluating on two common datasets and show that data filtering generally improves overall translation quality, in some cases even significantly.
2021
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
David Ifeoluwa Adelani | Jade Abbott | Graham Neubig | Daniel D’souza | Julia Kreutzer | Constantine Lignos | Chester Palen-Michel | Happy Buzaaba | Shruti Rijhwani | Sebastian Ruder | Stephen Mayhew | Israel Abebe Azime | Shamsuddeen H. Muhammad | Chris Chinenye Emezue | Joyce Nakatumba-Nabende | Perez Ogayo | Aremu Anuoluwapo | Catherine Gitau | Derguene Mbaye | Jesujoba Alabi | Seid Muhie Yimam | Tajuddeen Rabiu Gwadabe | Ignatius Ezeani | Rubungo Andre Niyongabo | Jonathan Mukiibi | Verrah Otiende | Iroro Orife | Davis David | Samba Ngom | Tosin Adewumi | Paul Rayson | Mofetoluwa Adeyemi | Gerald Muriuki | Emmanuel Anebi | Chiamaka Chukwuneke | Nkiruka Odu | Eric Peter Wairagala | Samuel Oyerinde | Clemencia Siro | Tobius Saul Bateesa | Temilola Oloyede | Yvonne Wambui | Victor Akinode | Deborah Nabagereka | Maurice Katusiime | Ayodele Awokoya | Mouhamadane MBOUP | Dibora Gebreyohannes | Henok Tilaye | Kelechi Nwaike | Degaga Wolde | Abdoulaye Faye | Blessing Sibanda | Orevaoghene Ahia | Bonaventure F. P. Dossou | Kelechi Ogueji | Thierno Ibrahima DIOP | Abdoulaye Diallo | Adewale Akinfaderin | Tendai Marengereke | Salomey Osei
Transactions of the Association for Computational Linguistics, Volume 9
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
Search
Fix author
Co-authors
- Tosin Adewumi 2
- Mofetoluwa Adeyemi 2
- Jesujoba Alabi 2
- Chris Chinenye Emezue 2
- Shamsuddeen Hassan Muhammad 2
- Jade Abbott 1
- Idris Abdulmumin 1
- David Ifeoluwa Adelani 1
- Orevaoghene Ahia 1
- Adewale Akinfaderin 1
- Victor Akinode 1
- Emmanuel Anebi 1
- Aremu Anuoluwapo 1
- Everlyn Asiko 1
- Ayodele Awokoya 1
- Israel Abebe Azime 1
- Tobius Saul Bateesa 1
- Michael Beukman 1
- Happy Buzaaba 1
- Chiamaka Chukwuneke 1
- Thierno Ibrahima DIOP 1
- Davis David 1
- Abdoulaye Diallo 1
- Bonaventure F. P. Dossou 1
- Daniel D’souza 1
- Ignatius Ezeani 1
- Abdoulaye Faye 1
- Dibora Gebreyohannes 1
- Catherine Gitau 1
- Maurice Katusiime 1
- Julia Kreutzer 1
- Constantine Lignos 1
- Mouhamadane MBOUP 1
- Tendai Marengereke 1
- Stephen Mayhew 1
- Derguene Mbaye 1
- Jonathan Mukiibi 1
- Gerald Muriuki 1
- Deborah Nabagereka 1
- Joyce Nakatumba-Nabende 1
- Graham Neubig 1
- Samba Ngom 1
- Rubungo Andre Niyongabo 1
- Kelechi Nwaike 1
- Nkiruka Odu 1
- Perez Ogayo 1
- Kelechi Ogueji 1
- Temilola Oloyede 1
- Iroro Orife 1
- Salomey Osei 1
- Verrah Otiende 1
- Samuel Oyerinde 1
- Chester Palen-Michel 1
- Paul Rayson 1
- Shruti Rijhwani 1
- Sebastian Ruder 1
- Blessing Kudzaishe Sibanda 1
- Sahib Singh 1
- Clemencia Siro 1
- Henok Tilaye 1
- Eric Peter Wairagala 1
- Yvonne Wambui 1
- Degaga Wolde 1
- Seid Muhie Yimam 1
- Oreen Yousuf 1