Identifying and Exploiting Definitions in Wordnet Bahasa

David Moeljadi, Francis Bond


Abstract
This paper describes our attempts to add Indonesian definitions to synsets in the Wordnet Bahasa (Nurril Hirfana Mohamed Noor et al., 2011; Bond et al., 2014), to extract semantic relations between lemmas and definitions for nouns and verbs, such as synonym, hyponym, hypernym and instance hypernym, and to generally improve Wordnet. The original, somewhat noisy, definitions for Indonesian came from the Asian Wordnet project (Riza et al., 2010). The basic method of extracting the relations is based on Bond et al. (2004). Before the relations can be extracted, the definitions were cleaned up and tokenized. We found that the definitions cannot be completely cleaned up because of many misspellings and bad translations. However, we could identify four semantic relations in 57.10% of noun and verb definitions. For the remaining 42.90%, we propose to add 149 new Indonesian lemmas and make some improvements to Wordnet Bahasa and Wordnet in general.
Anthology ID:
2016.gwc-1.33
Volume:
Proceedings of the 8th Global WordNet Conference (GWC)
Month:
27--30 January
Year:
2016
Address:
Bucharest, Romania
Editors:
Christiane Fellbaum, Piek Vossen, Verginica Barbu Mititelu, Corina Forascu
Venue:
GWC
SIG:
SIGLEX
Publisher:
Global Wordnet Association
Note:
Pages:
227–233
Language:
URL:
https://aclanthology.org/2016.gwc-1.33
DOI:
Bibkey:
Cite (ACL):
David Moeljadi and Francis Bond. 2016. Identifying and Exploiting Definitions in Wordnet Bahasa. In Proceedings of the 8th Global WordNet Conference (GWC), pages 227–233, Bucharest, Romania. Global Wordnet Association.
Cite (Informal):
Identifying and Exploiting Definitions in Wordnet Bahasa (Moeljadi & Bond, GWC 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.gwc-1.33.pdf