Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)
Previous studies have shown that the knowledge about attributes and properties in the SUMO ontology and its mapping to WordNet adjectives lacks of an accurate and complete characterization. A proper characterization of this type of knowledge is required to perform formal commonsense reasoning based on the SUMO properties, for instance to distinguish one concept from another based on their properties. In this context, we propose a new semi-automatic approach to model the knowledge about properties and attributes in SUMO by exploiting the information encoded in WordNet adjectives and its mapping to SUMO. To that end, we considered clusters of semantically related groups of WordNet adjectival and nominal synsets. Based on these clusters, we propose a new semi-automatic model for SUMO attributes and their mapping to WordNet, which also includes polarity information. In this paper, as an exploratory approach, we focus on qualities.
Due to rapid urbanization and a homogenized medium of instruction imposed in educational institutions, we have lost much of the golden literary offerings of the diverse languages and dialects that India once possessed. There is an urgent need to mitigate the paucity of online linguistic resources for several Hindi dialects. Given the corpus of a dialect, our system integrates the vocabulary of the dialect to the synsets of IndoWordnet along with their corresponding meta-data. Furthermore, we propose a systematic method for generating exemplary sentences for each newly integrated dialect word. The vocabulary thus integrated follows the schema of the wordnet and generates exemplary sentences to illustrate the meaning and usage of the word. We illustrate our methodology with the integration of words in the Awadhi dialect to the Hindi IndoWordnet to achieve an enrichment of 11.68 % to the existing Hindi synsets. The BLEU metric for evaluating the quality of sentences yielded a 75th percentile score of 0.6351.
WordNet, while one of the most widely used resources for NLP, has not been updated for a long time, and as such a new project English WordNet has arisen to continue the development of the model under an open-source paradigm. In this paper, we detail the second release of this resource entitled “English WordNet 2020”. The work has focused firstly, on the introduction of new synsets and senses and developing guidelines for this and secondly, on the integration of contributions from other projects. We present the changes in this edition, which total over 15,000 changes over the previous release.
Wordnets are lexical databases where the semantic relations of words and concepts are established. These resources are useful for manyNLP tasks, such as automatic text classification, word-sense disambiguation or machine translation. In comparison with other wordnets,the Basque version is smaller and some PoS are underrepresented or missing e.g. adjectives and adverbs. In this work, we explore anovel approach to enrich the Basque WordNet, focusing on the adjectives. We want to prove the use and and effectiveness of sentimentlexicons to enrich the resource without the need of starting from scratch. Using as complementary resources, one dictionary and thesentiment valences of the words, we check if the word of the lexicon matches with the meaning of the synset, and if it matches we addthe word as variant to the Basque WordNet. Following this methodology, we describe the most frequent adjectives with positive andnegative valence, the matches and the possible solutions for the non-matches.
Information systems gathering big amounts of resources growing with time containing distinct modalities (text, audio, video, images, GIS) and aggregating content in various ways (modular e-learning modules, Web systems presenting cultural artefacts) require tools supporting content description. The subject of the description may be the topic and the characteristics of the content expressed by sets of attributes. To describe such resources one can just use some of existing indexing languages like thesauri, classification systems, domain and upper ontologies, terminologies or dictionaries. When appropriate language does not exist, it is necessary to build a new system, which will have to serve both experts who describe resources and non-experts who search through them. The solution presented in this paper used to resource description, allows experts to freely select words and expressions, which are organized in hierarchies of various nature, including that of domain and application character. This is based on the wordnet structure, which introduces a clear order for each of these groups due to its lexical nature. The paper presents two systems where such approach was applied: the E-archaeology.org e-learning content repository in which domain knowledge was integrated to describe content topics and the Hatch system gathering multimodal information about the archaeological site targeted at a wide audience, where application conceptualization was applied to describe the content by a set of attributes.
We extend the Open WordNet for English (OWN-EN) with rock-related and other lithological terms using the authoritative source of GBA’s Thesaurus. Our aim is to improve WordNet to better function within Oil & Gas domain, particularly geoscience texts. We use a three step approach: a proof of concept-level extension of WordNet, a major extension on which we evaluate the impact with positive results and a full extension encompassing all GBA’s lithological terms. We also build a mapping to GBA which also links to several other resources: WikiData, British Geological Survey, Inspire, GeoSciML and DBpedia.
We describe on-going work consisting in adding pronunciation information to wordnets, as such information can indicate specific senses of a word. Many wordnets associate with their senses only a lemma form and a part-of-speech tag. At the same time, we are aware that additional linguistic information can be useful for identifying a specific sense of a wordnet lemma when encountered in a corpus. While work already deals with the addition of grammatical number or grammatical gender information to wordnet lemmas,we are investigating the linking of wordnet lemmas to pronunciation information, adding thus a speech-related modality to wordnets