Latvian WordNet

Peteris Paikens, Agute Klints, Ilze Lokmane, Lauma Pretkalniņa, Laura Rituma, Madara Stāde, Laine Strankale


Abstract
This paper describes the recently developed Latvian WordNet and the main linguistic principles used in its development. The inventory of words and senses is based on the Te̅zaurs.lv online dictionary, restructuring the senses of the most frequently used words based on corpus evidence. The semantic linking methodology adapts Princeton WordNet principles to fit the Latvian language usage and existing linguistic tradition. The semantic links include hyponymy, meronymy, antonymy, similarity, conceptual connection and gradation. We also measure inter-annotator agreement for different types of semantic links. The dataset consists of 7609 words linked in 6515 synsets. 1266 of these words are considered fully completed as they have all the outgoing semantic links annotated, corpus examples assigned for each sense, as well as links to the English Princeton WordNet formed. The data is available to the public on Te̅zaurs.lv as an addition to the general dictionary data, and is also published as a downloadable dataset.
Anthology ID:
2023.gwc-1.23
Volume:
Proceedings of the 12th Global Wordnet Conference
Month:
January
Year:
2023
Address:
University of the Basque Country, Donostia - San Sebastian, Basque Country
Editors:
German Rigau, Francis Bond, Alexandre Rademaker
Venue:
GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
187–196
Language:
URL:
https://aclanthology.org/2023.gwc-1.23
DOI:
Bibkey:
Cite (ACL):
Peteris Paikens, Agute Klints, Ilze Lokmane, Lauma Pretkalniņa, Laura Rituma, Madara Stāde, and Laine Strankale. 2023. Latvian WordNet. In Proceedings of the 12th Global Wordnet Conference, pages 187–196, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
Cite (Informal):
Latvian WordNet (Paikens et al., GWC 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.gwc-1.23.pdf