Verginica Mititelu

2026

Two Birds with One Stone: Annotating Romanian Multiword Expressions with an Eye to the PARSEME 2.0 Guidelines Applicability
Verginica Mititelu | Mihaela Cristescu | Elena Irimia | Carmen Mîrzea Vasile
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)

This paper presents an enhanced version of the Romanian corpus previously annotated only for verbal multiword expressions. The new release extends the annotation to multiword expressions of other parts of speech, following version 2.0 of the PARSEME guidelines. The corpus has been expanded, its new part was automatically morpho-syntactically annotated based on the Universal Dependencies framework, followed by extensive semi-automatic annotation of multiword expressions across all morphological categories. The paper also reports quantitative data on the updated corpus and discusses the distribution and characteristics of Romanian multiword expressions. We also highlight language-specific annotation challenges and issues arising from the PARSEME 2.0 guidelines.

2025

pdf bib abs

Lexica of MWEs have always been a valuable resource for various NLP tasks. This paper presents the results of a comprehensive survey on multiword lexical resources that extends a previous one from 2016 to the present. We analyze a diverse set of lexica across multiple languages, reporting on aspects such as creation date, intended usage, languages covered and linguality type, content, acquisition method, accessibility, and linkage to other language resources. Our findings highlight trends in MWE lexicon development focusing on the representation level of languages. This survey aims to support future efforts in creating MWE lexica for NLP applications by identifying these gaps and opportunities.

2023

pdf bib abs

The Romanian Wordnet in Linked Open Data Format
Elena Irimia | Verginica Mititelu
Proceedings of the 12th Global Wordnet Conference

In this paper we present the standardization of the Romanian Wordnet by means of conversion to the Linked Open Data format. We describe the vocabularies used to encode data and metadata of this resource. The decisions made are in accordance with the characteristics of the Romanian Wordnet, which are the outcome of the development method, enrichment strategies and resources used for its creations. By interlinking with other resources, words in the Romanian Wordnet have now the pronunciation associated, as well as syntagmatic information, in the form of contexts of occurrences.

pdf bib

Adopting Linguistic Linked Data Principles: Insights on Users’ Experience
Verginica Mititelu | Maria Pia Di Buono | Hugo Gonçalo Oliveira | Blerina Spahiu | Giedrė Valūnaitė-Oleškevičienė
Proceedings of the 4th Conference on Language, Data and Knowledge

2021

pdf bib abs

Semantic Analysis of Verb-Noun Derivation in Princeton WordNet
Verginica Mititelu | Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the 11th Global Wordnet Conference

We present here the results of a morphosemantic analysis of the verb-noun pairs in the Princeton WordNet as reflected in the standoff file containing pairs annotated with a set of 14 semantic relations. We have automatically distinguished between zero-derivation and affixal derivation in the data and identified the affixes and manually checked the results. The data show that for each semantic relation an affix prevails in creating new words, although we cannot talk about their specificity with respect to such a relation. Moreover, certain pairs of verb-noun semantic primes are better represented for each semantic relation, and some semantic clusters (in the form of WordNet subtrees) take shape as a result. We thus employ a large-scale data-driven linguistically motivated analysis afforded by the rich derivational and morphosemantic description in WordNet to the end of capturing finer regularities in the process of derivation as represented in the semantic properties of the words involved and as reflected in the structure of the lexicon.

2019

pdf bib abs

Evaluating the Wordnet and CoRoLa-based Word Embedding Vectors for Romanian as Resources in the Task of Microworlds Lexicon Expansion
Elena Irimia | Maria Mitrofan | Verginica Mititelu
Proceedings of the 10th Global Wordnet Conference

Within a larger frame of facilitating human-robot interaction, we present here the creation of a core vocabulary to be learned by a robot. It is extracted from two tokenised and lemmatized scenarios pertaining to two imagined microworlds in which the robot is supposed to play an assistive role. We also evaluate two resources for their utility for expanding this vocabulary so as to better cope with the robot’s communication needs. The language under study is Romanian and the resources used are the Romanian wordnet and word embedding vectors extracted from the large representative corpus of contemporary Romanian, CoRoLa. The evaluation is made for two situations: one in which the words are not semantically disambiguated before expanding the lexicon, and another one in which they are disambiguated with senses from the Romanian wordnet. The appropriateness of each resource is discussed.

pdf bib abs

Leaving No Stone Unturned When Identifying and Classifying Verbal Multiword Expressions in the Romanian Wordnet
Verginica Mititelu | Maria Mitrofan
Proceedings of the 10th Global Wordnet Conference

We present here the enhancement of the Romanian wordnet with a new type of information, very useful in language processing, namely types of verbal multi-word expressions. All verb literals made of two or more words are attached a label specific to the type of verbal multi-word expression they correspond to. These labels were created in the PARSEME Cost Action and were used in the version 1.1 of the shared task they organized. The results of this annotation are compared to those obtained in the annotation of a Romanian news corpus with the same labels. Given the alignment of the Romanian wordnet to the Princeton WordNet, this type of annotation can be further used for drawing comparisons between equivalent verbal literals in various languages, provided that such information is annotated in the wordnets of the respective languages and their wordnets are aligned to Princeton WordNet, and thus to the Romanian wordnet.

2018

pdf bib abs

Investigating English Affixes and their Productivity with Princeton WordNet
Verginica Mititelu
Proceedings of the 9th Global Wordnet Conference

Such a rich language resource like Princeton WordNet, containing linguistic information of different types (semantic, lexical, syntactic, derivational, dialectal, etc.), is a thesaurus which is worth both being used in various language-enabled applications and being explored in order to study a language. In this paper we show how we used Princeton WordNet version 3.0 to study the English affixes. We extracted pairs of base-derived words and identified the affixes by means of which the derived words were created from their bases. We distinguished among four types of derivation depending on the type of overlapping between the senses of the base word and those of the derived word that are linked by derivational relations in Princeton WordNet. We studied the behaviour of affixes with respect to these derivation types. Drawing on these data, we inferred about their productivity.