A picture is worth a thousand words: Using OpenClipArt library for enriching IndoWordNet

Diptesh Kanojia, Shehzaad Dhuliawala, Pushpak Bhattacharyya


Abstract
WordNet has proved to be immensely useful for Word Sense Disambiguation, and thence Machine translation, Information Retrieval and Question Answering. It can also be used as a dictionary for educational purposes. The semantic nature of concepts in a WordNet motivates one to try to express this meaning in a more visual way. In this paper, we describe our work of enriching IndoWordNet with image acquisitions from the OpenClipArt library. We describe an approach used to enrich WordNets for eighteen Indian languages. Our contribution is three fold: (1) We develop a system, which, given a synset in English, finds an appropriate image for the synset. The system uses the OpenclipArt library (OCAL) to retrieve images and ranks them. (2) After retrieving the images, we map the results along with the linkages between Princeton WordNet and Hindi WordNet, to link several synsets to corresponding images. We choose and sort top three images based on our ranking heuristic per synset. (3) We develop a tool that allows a lexicographer to manually evaluate these images. The top images are shown to a lexicographer by the evaluation tool for the task of choosing the best image representation. The lexicographer also selects the number of relevant images. Using our system, we obtain an Average Precision (P @ 3) score of 0.30.
Anthology ID:
2016.gwc-1.23
Volume:
Proceedings of the 8th Global WordNet Conference (GWC)
Month:
27--30 January
Year:
2016
Address:
Bucharest, Romania
Editors:
Christiane Fellbaum, Piek Vossen, Verginica Barbu Mititelu, Corina Forascu
Venue:
GWC
SIG:
SIGLEX
Publisher:
Global Wordnet Association
Note:
Pages:
150–154
Language:
URL:
https://aclanthology.org/2016.gwc-1.23
DOI:
Bibkey:
Cite (ACL):
Diptesh Kanojia, Shehzaad Dhuliawala, and Pushpak Bhattacharyya. 2016. A picture is worth a thousand words: Using OpenClipArt library for enriching IndoWordNet. In Proceedings of the 8th Global WordNet Conference (GWC), pages 150–154, Bucharest, Romania. Global Wordnet Association.
Cite (Informal):
A picture is worth a thousand words: Using OpenClipArt library for enriching IndoWordNet (Kanojia et al., GWC 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.gwc-1.23.pdf