Initial Experiments for Building a Guarani WordNet

Luis Chiruzzo, Marvin Agüero-Torales, Aldo Alvarez, Yliana Rodríguez


Abstract
This paper presents a work in progress about creating a Guarani version of the WordNet database. Guarani is an indigenous South American language and is a low-resource language from the NLP perspective. Following the expand approach, we aim to find Guarani lemmas that correspond to the concepts defined in WordNet. We do this through three strategies that try to select the correct lemmas from Guarani-Spanish datasets. We ran them through three different bilingual dictionaries and had native speakers assess the results. This procedure found Guarani lemmas for about 6.5 thousand synsets, including 27% of the base WordNet concepts. However, more work on the quality of the selected words will be needed in order to create a final version of the dataset.
Anthology ID:
2023.gwc-1.24
Volume:
Proceedings of the 12th Global Wordnet Conference
Month:
January
Year:
2023
Address:
University of the Basque Country, Donostia - San Sebastian, Basque Country
Editors:
German Rigau, Francis Bond, Alexandre Rademaker
Venue:
GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
197–204
Language:
URL:
https://aclanthology.org/2023.gwc-1.24
DOI:
Bibkey:
Cite (ACL):
Luis Chiruzzo, Marvin Agüero-Torales, Aldo Alvarez, and Yliana Rodríguez. 2023. Initial Experiments for Building a Guarani WordNet. In Proceedings of the 12th Global Wordnet Conference, pages 197–204, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
Cite (Informal):
Initial Experiments for Building a Guarani WordNet (Chiruzzo et al., GWC 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.gwc-1.24.pdf