Ikechukwu Onyenwe


pdf bib
Igbo Diacritic Restoration using Embedding Models
Ignatius Ezeani | Mark Hepple | Ikechukwu Onyenwe | Enemouh Chioma
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the native language of the Igbo people of south-eastern Nigeria. In Igbo language, diacritics - orthographic and tonal - play a huge role in the distinguishing the meaning and pronunciation of words. Omitting diacritics in texts often leads to lexical ambiguity. Diacritic restoration is a pre-processing task that replaces missing diacritics on words from which they have been removed. In this work, we applied embedding models to the diacritic restoration task and compared their performances to those of n-gram models. Although word embedding models have been successfully applied to various NLP tasks, it has not been used, to our knowledge, for diacritic restoration. Two classes of word embeddings models were used: those projected from the English embedding space; and those trained with Igbo bible corpus (≈ 1m). Our best result, 82.49%, is an improvement on the baseline n-gram models.

pdf bib
Transferred Embeddings for Igbo Similarity, Analogy, and Diacritic Restoration Tasks
Ignatius Ezeani | Ikechukwu Onyenwe | Mark Hepple
Proceedings of the Third Workshop on Semantic Deep Learning

Existing NLP models are mostly trained with data from well-resourced languages. Most minority languages face the challenge of lack of resources - data and technologies - for NLP research. Building these resources from scratch for each minority language will be very expensive, time-consuming and amount largely to unnecessarily re-inventing the wheel. In this paper, we applied transfer learning techniques to create Igbo word embeddings from a variety of existing English trained embeddings. Transfer learning methods were also used to build standard datasets for Igbo word similarity and analogy tasks for intrinsic evaluation of embeddings. These projected embeddings were also applied to diacritic restoration task. Our results indicate that the projected models not only outperform the trained ones on the semantic-based tasks of analogy, word-similarity, and odd-word identifying, but they also achieve enhanced performance on the diacritic restoration with learned diacritic embeddings.


pdf bib
Lexical Disambiguation of Igbo using Diacritic Restoration
Ignatius Ezeani | Mark Hepple | Ikechukwu Onyenwe
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

Properly written texts in Igbo, a low-resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n-gram models with simple smoothing techniques based on a closed-world assumption. However, as a classification task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, therefore, presents a more standard approach to dealing with the task which involves the application of machine learning algorithms.


pdf bib
Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language
Ikechukwu Onyenwe | Mark Hepple | Chinedu Uchechukwu | Ignatius Ezeani
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects


pdf bib
Part-of-speech Tagset and Corpus Development for Igbo, an African Language
Ikechukwu Onyenwe | Chinedu Uchechukwu | Mark Hepple
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop