We present here the results of a morphosemantic analysis of the verb-noun pairs in the Princeton WordNet as reflected in the standoff file containing pairs annotated with a set of 14 semantic relations. We have automatically distinguished between zero-derivation and affixal derivation in the data and identified the affixes and manually checked the results. The data show that for each semantic relation an affix prevails in creating new words, although we cannot talk about their specificity with respect to such a relation. Moreover, certain pairs of verb-noun semantic primes are better represented for each semantic relation, and some semantic clusters (in the form of WordNet subtrees) take shape as a result. We thus employ a large-scale data-driven linguistically motivated analysis afforded by the rich derivational and morphosemantic description in WordNet to the end of capturing finer regularities in the process of derivation as represented in the semantic properties of the words involved and as reflected in the structure of the lexicon.
We present here the enhancement of the Romanian wordnet with a new type of information, very useful in language processing, namely types of verbal multi-word expressions. All verb literals made of two or more words are attached a label specific to the type of verbal multi-word expression they correspond to. These labels were created in the PARSEME Cost Action and were used in the version 1.1 of the shared task they organized. The results of this annotation are compared to those obtained in the annotation of a Romanian news corpus with the same labels. Given the alignment of the Romanian wordnet to the Princeton WordNet, this type of annotation can be further used for drawing comparisons between equivalent verbal literals in various languages, provided that such information is annotated in the wordnets of the respective languages and their wordnets are aligned to Princeton WordNet, and thus to the Romanian wordnet.
Evaluating the Wordnet and CoRoLa-based Word Embedding Vectors for Romanian as Resources in the Task of Microworlds Lexicon Expansion
Elena Irimia | Maria Mitrofan | Verginica Mititelu
Proceedings of the 10th Global Wordnet Conference
Within a larger frame of facilitating human-robot interaction, we present here the creation of a core vocabulary to be learned by a robot. It is extracted from two tokenised and lemmatized scenarios pertaining to two imagined microworlds in which the robot is supposed to play an assistive role. We also evaluate two resources for their utility for expanding this vocabulary so as to better cope with the robot’s communication needs. The language under study is Romanian and the resources used are the Romanian wordnet and word embedding vectors extracted from the large representative corpus of contemporary Romanian, CoRoLa. The evaluation is made for two situations: one in which the words are not semantically disambiguated before expanding the lexicon, and another one in which they are disambiguated with senses from the Romanian wordnet. The appropriateness of each resource is discussed.
Such a rich language resource like Princeton WordNet, containing linguistic information of different types (semantic, lexical, syntactic, derivational, dialectal, etc.), is a thesaurus which is worth both being used in various language-enabled applications and being explored in order to study a language. In this paper we show how we used Princeton WordNet version 3.0 to study the English affixes. We extracted pairs of base-derived words and identified the affixes by means of which the derived words were created from their bases. We distinguished among four types of derivation depending on the type of overlapping between the senses of the base word and those of the derived word that are linked by derivational relations in Princeton WordNet. We studied the behaviour of affixes with respect to these derivation types. Drawing on these data, we inferred about their productivity.