IgboBERT Models: Building and Training Transformer Models for the Igbo Language

Chiamaka Chukwuneke; Ignatius Ezeani; Paul Rayson; Mahmoud El-Haj

IgboBERT Models: Building and Training Transformer Models for the Igbo Language

Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson, Mahmoud El-Haj

Abstract

This work presents a standard Igbo named entity recognition (IgboNER) dataset as well as the results from training and fine-tuning state-of-the-art transformer IgboNER models. We discuss the process of our dataset creation - data collection and annotation and quality checking. We also present experimental processes involved in building an IgboBERT language model from scratch as well as fine-tuning it along with other non-Igbo pre-trained models for the downstream IgboNER task. Our results show that, although the IgboNER task benefited hugely from fine-tuning large transformer model, fine-tuning a transformer model built from scratch with comparatively little Igbo text data seems to yield quite decent results for the IgboNER task. This work will contribute immensely to IgboNLP in particular as well as the wider African and low-resource NLP efforts Keywords: Igbo, named entity recognition, BERT models, under-resourced, dataset

Anthology ID:: 2022.lrec-1.547
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 5114–5122
Language:
URL:: https://aclanthology.org/2022.lrec-1.547
DOI:
Bibkey:
Cite (ACL):: Chiamaka Chukwuneke, Ignatius Ezeani, Paul Rayson, and Mahmoud El-Haj. 2022. IgboBERT Models: Building and Training Transformer Models for the Igbo Language. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5114–5122, Marseille, France. European Language Resources Association.
Cite (Informal):: IgboBERT Models: Building and Training Transformer Models for the Igbo Language (Chukwuneke et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.547.pdf
Code: chiamakac/igboner-models
Data: MasakhaNER

PDF Cite Search Code