Navigating Challenges of Multilingual Resource Development for Under-Resourced Languages: The Case of the African Wordnet Project

Marissa Griesel, Sonja Bosch


Abstract
Creating a new wordnet is by no means a trivial task and when the target language is under-resourced as is the case for the languages currently included in the multilingual African Wordnet (AfWN), developers need to rely heavily on human expertise. During the different phases of development of the AfWN, we incorporated various methods of fast-tracking to ease the tedious and time-consuming work. Some methods have proven effective while others seem to have little positive impact on the work rate. As in the case of many other under-resourced languages, the expand model was implemented throughout, thus depending on English source data such as the English Princeton Wordnet (PWN) which is then translated into the target language with the assumption that the new language shares an underlying structure with the PWN. The paper discusses some problems encountered along the way and points out various possibilities of (semi) automated quality assurance measures and further refinement of the AfWN to ensure accelerated growth. In this paper we aim to highlight some of the lessons learnt from hands-on experience in order to facilitate similar projects, in particular for languages from other African countries.
Anthology ID:
2020.rail-1.8
Volume:
Proceedings of the first workshop on Resources for African Indigenous Languages
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Rooweither Mabuya, Phathutshedzo Ramukhadi, Mmasibidi Setaka, Valencia Wagner, Menno van Zaanen
Venue:
RAIL
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
45–50
Language:
English
URL:
https://aclanthology.org/2020.rail-1.8
DOI:
Bibkey:
Cite (ACL):
Marissa Griesel and Sonja Bosch. 2020. Navigating Challenges of Multilingual Resource Development for Under-Resourced Languages: The Case of the African Wordnet Project. In Proceedings of the first workshop on Resources for African Indigenous Languages, pages 45–50, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Navigating Challenges of Multilingual Resource Development for Under-Resourced Languages: The Case of the African Wordnet Project (Griesel & Bosch, RAIL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.rail-1.8.pdf