Elvis Mboning


2022

pdf bib
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani | Graham Neubig | Sebastian Ruder | Shruti Rijhwani | Michael Beukman | Chester Palen-Michel | Constantine Lignos | Jesujoba O. Alabi | Shamsuddeen H. Muhammad | Peter Nabende | Cheikh M. Bamba Dione | Andiswa Bukula | Rooweither Mabuya | Bonaventure F. P. Dossou | Blessing Sibanda | Happy Buzaaba | Jonathan Mukiibi | Godson Kalipe | Derguene Mbaye | Amelia Taylor | Fatoumata Kabore | Chris Chinenye Emezue | Anuoluwapo Aremu | Perez Ogayo | Catherine Gitau | Edwin Munkoh-Buabeng | Victoire Memdjokam Koagne | Allahsera Auguste Tapo | Tebogo Macucwa | Vukosi Marivate | Elvis Mboning | Tajuddeen Gwadabe | Tosin Adewumi | Orevaoghene Ahia | Joyce Nakatumba-Nabende | Neo L. Mokono | Ignatius Ezeani | Chiamaka Chukwuneke | Mofetoluwa Adeyemi | Gilles Q. Hacheme | Idris Abdulmumim | Odunayo Ogundepo | Oreen Yousuf | Tatiana Moteu Ngoli | Dietrich Klakow
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

African languages are spoken by over a billion people, but they are under-represented in NLP research and development. Multiple challenges exist, including the limited availability of annotated training and evaluation datasets as well as the lack of understanding of which settings, languages, and recently proposed methods like cross-lingual transfer will be effective. In this paper, we aim to move towards solutions for these challenges, focusing on the task of named entity recognition (NER). We present the creation of the largest to-date human-annotated NER dataset for 20 African languages. We study the behaviour of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, empirically demonstrating that the choice of source transfer language significantly affects performance. While much previous work defaults to using English as the source language, our results show that choosing the best transfer language improves zero-shot F1 scores by an average of 14% over 20 languages as compared to using English.

2021

pdf bib
Construire des ressources collaboratives pour les langues peu dotées: une modélisation orientée communauté (Building collaborative resources for poorly endowed languages : community-oriented modeling )
Elvis Mboning | Ornella Wandji
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Les applications du traitement automatique des langues (TAL) nourrissent aujourd’hui une bonne partie des langues indo-européennes en raison des corpus linguistiques de qualité disponibles en grande quantité et variété. Les corpus de données open sources en langues africaines étant quasi inexistants, comment arrimer les avancées du TAL à ces langues peu dotées ? Dans cet article, nous examinons le problème de construction des ressources lexicographiques pour les langues peu dotées. Nous souhaitons introduire un modèle de construction des ressources lexicographiques en exploitant les compétences socio-linguistiques des communautés linguistiques locales. Au fil des sections, nous présenterons le nouveau modèle de codification des dictionnaires issue de cette modélisation orientée communauté.

2020

pdf bib
NTeALan Dictionaries Platforms: An Example Of Collaboration-Based Model
Elvis Mboning | Daniel Baleba | Jean Marc Bassahak | Ornella Wandji | Jules Assoumou
Proceedings of the 1st International Workshop on Language Technology Platforms

Nowadays the scarcity and dispersion of open-source NLP resources and tools in and for African languages make it difficult for researchers to truly fit these languages into current algorithms of artificial intelligence, resulting in the stagnation of these numerous languages, as far as technological progress is concerned. Created in 2017, with the aim of building communities of voluntary contributors around African native and/or national languages, cultures, NLP technologies and artificial intelligence, the NTeALan association has set up a series of web collaborative platforms intended to allow the aforementioned communities to create and manage their own lexicographic and linguistic resources. This paper aims at presenting the first versions of three lexicographic platforms that we developed in and for African languages: the REST/GraphQL API for saving lexicographic resources, the dictionary management platform and the collaborative dictionary platform. We also describe the data representation format used for these resources. After experimenting with a few dictionaries and looking at users feedback, we are convinced that only collaboration-based approaches and platforms can effectively respond to challenges of producing quality resources in and for African native and/or national languages.