Lexical Disambiguation of Igbo using Diacritic Restoration

Ignatius Ezeani, Mark Hepple, Ikechukwu Onyenwe


Abstract
Properly written texts in Igbo, a low-resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n-gram models with simple smoothing techniques based on a closed-world assumption. However, as a classification task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, therefore, presents a more standard approach to dealing with the task which involves the application of machine learning algorithms.
Anthology ID:
W17-1907
Volume:
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Jose Camacho-Collados, Mohammad Taher Pilehvar
Venue:
SENSE
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–60
Language:
URL:
https://aclanthology.org/W17-1907
DOI:
10.18653/v1/W17-1907
Bibkey:
Cite (ACL):
Ignatius Ezeani, Mark Hepple, and Ikechukwu Onyenwe. 2017. Lexical Disambiguation of Igbo using Diacritic Restoration. In Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, pages 53–60, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Lexical Disambiguation of Igbo using Diacritic Restoration (Ezeani et al., SENSE 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1907.pdf