Ivelina Stoyanova


pdf bib
Semantic Analysis of Verb-Noun Derivation in Princeton WordNet
Verginica Mititelu | Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the 11th Global Wordnet Conference

We present here the results of a morphosemantic analysis of the verb-noun pairs in the Princeton WordNet as reflected in the standoff file containing pairs annotated with a set of 14 semantic relations. We have automatically distinguished between zero-derivation and affixal derivation in the data and identified the affixes and manually checked the results. The data show that for each semantic relation an affix prevails in creating new words, although we cannot talk about their specificity with respect to such a relation. Moreover, certain pairs of verb-noun semantic primes are better represented for each semantic relation, and some semantic clusters (in the form of WordNet subtrees) take shape as a result. We thus employ a large-scale data-driven linguistically motivated analysis afforded by the rich derivational and morphosemantic description in WordNet to the end of capturing finer regularities in the process of derivation as represented in the semantic properties of the words involved and as reflected in the structure of the lexicon.


pdf bib
Structural Approach to Enhancing WordNet with Conceptual Frame Semantics
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper outlines procedures for enhancing WordNet with conceptual information from FrameNet. The mapping of the two resources is non-trivial. We define a number of techniques for the validation of the consistency of the mapping and the extension of its coverage which make use of the structure of both resources and the systematic relations between synsets in WordNet and between frames in FrameNet, as well as between synsets and frames). We present a case study on causativity, a relation which provides enhancement complementary to the one using hierarchical relations, by means of linking in a systematic way large parts of the lexicon. We show how consistency checks and denser relations may be implemented on the basis of this relation. We, then, propose new frames based on causative-inchoative correspondences and in conclusion touch on the possibilities for defining new frames based on the types of specialisation that takes place from parent to child synset.

pdf bib
Enhancing Conceptual Description through Resource Linking and Exploration of Semantic Relations
Ivelina Stoyanova | Svetlozara Leseva
Proceedings of the 10th Global Wordnet Conference

The paper presents current efforts towards linking two large lexical semantic resources – WordNet and FrameNet – to the end of their mutual enrichment and the facilitation of the access, extraction and analysis of various types of semantic and syntactic information. In the second part of the paper, we go on to examine the relation of inheritance and other semantic relations as represented in WordNet and FrameNet and how they correspond to each other when the resources are aligned. We discuss the implications with respect to the enhancement of the two resources through the definition of new relations and the detailisation of conceptual frames.

pdf bib
Hear about Verbal Multiword Expressions in the Bulgarian and the Romanian Wordnets Straight from the Horse’s Mouth
Verginica Barbu Mititelu | Ivelina Stoyanova | Svetlozara Leseva | Maria Mitrofan | Tsvetana Dimitrova | Maria Todorova
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

In this paper we focus on verbal multiword expressions (VMWEs) in Bulgarian and Romanian as reflected in the wordnets of the two languages. The annotation of VMWEs relies on the classification defined within the PARSEME Cost Action. After outlining the properties of various types of VMWEs, a cross-language comparison is drawn, aimed to highlight the similarities and the differences between Bulgarian and Romanian with respect to the lexicalization and distribution of VMWEs. The contribution of this work is in outlining essential features of the description and classification of VMWEs and the cross-language comparison at the lexical level, which is essential for the understanding of the need for uniform annotation guidelines and a viable procedure for validation of the annotation.


pdf bib
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.


pdf bib
The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Multiword expressions (MWEs) are known as a “pain in the neck” for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as “words with spaces”. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.


pdf bib
Automatic Prediction of Morphosemantic Relations
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Tsvetana Dimitrova | Maria Todorova
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents a machine learning method for automatic identification and classification of morphosemantic relations (MSRs) between verb and noun synset pairs in the Bulgarian WordNet (BulNet). The core training data comprise 6,641 morphosemantically related verb–noun literal pairs from BulNet. The core dataset were preprocessed quality-wise by applying validation and reorganisation procedures. Further, the data were supplemented with negative examples of literal pairs not linked by an MSR. The designed supervised machine learning method uses the RandomTree algorithm and is implemented in Java with the Weka package. A set of experiments were performed to test various approaches to the task. Future work on improving the classifier includes adding more training data, employing more features, and fine-tuning. Apart from the language specific information about derivational processes, the proposed method is language independent.


pdf bib
Automatic Classification of WordNet Morphosemantic Relations
Svetlozara Leseva | Ivelina Stoyanova | Maria Todorova | Tsvetana Dimitrova | Borislav Rizov | Svetla Koeva
The 5th Workshop on Balto-Slavic Natural Language Processing


pdf bib
Wordnet-Based Cross-Language Identification of Semantic Relations
Ivelina Stoyanova | Svetla Koeva | Svetlozara Leseva
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

pdf bib
Text Modification for Bulgarian Sign Language Users
Slavina Lozanova | Ivelina Stoyanova | Svetlozara Leseva | Svetla Koeva | Boian Savtchev
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations


pdf bib
Application of Clause Alignment for Statistical Machine Translation
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Rositsa Dekova | Angel Genov | Borislav Rizov | Tsvetana Dimitrova | Ekaterina Tarpomanova | Hristina Kukova
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Bulgarian X-language Parallel Corpus
Svetla Koeva | Ivelina Stoyanova | Rositsa Dekova | Borislav Rizov | Angel Genov
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling ― general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisation, syntactic parsing, named entity recognition, word sense disambiguation, etc.) or for a particular task (word and clause alignment). To achieve uniformity of the annotation we have either annotated raw data from scratch or transformed the already existing annotation to follow the conventions accepted for BulNC. Finally, actual uses of the corpora are presented and conclusions are drawn with respect to future work.