Practical Approach on Implementation of WordNets for South African Languages

Tshephisho Joseph Sefara, Tumisho Billson Mokgonyane, Vukosi Marivate


Abstract
This paper proposes the implementation of WordNets for five South African languages, namely, Sepedi, Setswana, Tshivenda, isiZulu and isiXhosa to be added to open multilingual WordNets (OMW) on natural language toolkit (NLTK). The African WordNets are converted from Princeton WordNet (PWN) 2.0 to 3.0 to match the synsets in PWN 3.0. After conversion, there were 7157, 11972, 1288, 6380, and 9460 lemmas for Sepedi, Setswana, Tshivenda, isiZulu and isiX- hosa respectively. Setswana, isiXhosa, Sepedi contains more lemmas compared to 8 languages in OMW and isiZulu contains more lemmas compared to 7 languages in OMW. A library has been published for continuous development of African WordNets in OMW using NLTK.
Anthology ID:
2021.gwc-1.3
Volume:
Proceedings of the 11th Global Wordnet Conference
Month:
January
Year:
2021
Address:
University of South Africa (UNISA)
Editors:
Piek Vossen, Christiane Fellbaum
Venue:
GWC
SIG:
SIGLEX
Publisher:
Global Wordnet Association
Note:
Pages:
20–25
Language:
URL:
https://aclanthology.org/2021.gwc-1.3
DOI:
Bibkey:
Cite (ACL):
Tshephisho Joseph Sefara, Tumisho Billson Mokgonyane, and Vukosi Marivate. 2021. Practical Approach on Implementation of WordNets for South African Languages. In Proceedings of the 11th Global Wordnet Conference, pages 20–25, University of South Africa (UNISA). Global Wordnet Association.
Cite (Informal):
Practical Approach on Implementation of WordNets for South African Languages (Sefara et al., GWC 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.gwc-1.3.pdf
Code
 josephsefara/africanwordnet