Svetlozara Leseva


2024

pdf bib
A ‘Dipdive’ into Motion: Exploring Lexical Resources towards a Comprehensive Semantic and Syntactic Description
Svetlozara Leseva
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

In this paper I illustrate the semantic description of verbs provided in three semantic resources (FrameNet, VerbNet and VerbAtlas) in comparative terms with a view to identifying common and distinct components in their representation and obtaining a preliminary idea of the resources’ interoperability. To this end, I provide a comparison of a small sample of motion verbs aligned with semantic frames and classes in the three resources. I also describe the semantic annotation of Bulgarian motion verbs using the framework defined in the Berkeley FrameNet project and its enrichment with information from the other two resources, which has been enabled by the mapping between: (i) their major semantic units – FrameNet frames, VerbNet classes and VerbAtlas frames, and (ii) their ’building blocks’ – frame elements (FrameNet )and semantic roles (VerbNet, VerbAtlas).

2023

pdf bib
Expanding the Conceptual Description of Verbs in WordNet with Semantic and Syntactic Information
Ivelina Stoyanova | Svetlozara Leseva
Proceedings of the 12th Global Wordnet Conference

This paper describes an ongoing effort towards expanding the semantic and conceptual description of verbs in WordNet by combining information from two other resources, FrameNet and VerbNet, as well as enriching the verbs’ description with syntactic patterns extracted from the three resources. The conceptual description of verb synsets is provided by assigning a FrameNet frame which provides the relevant set of frame elements denoting the predicate’s participants and props. This information is supplemented by assigning a VerbNet class and the set of semantic roles associated with it. The information extracted from FrameNet and VerbNet and assigned to a synset is aligned (semi-automatically with subsequent manual corrections) at the following levels: (i) FrameNet frame: VerbNet class; (ii) FrameNet frame elements: VerbNet semantic roles; (iii) FrameNet semantic types and restrictions: VerbNet selectional restrictions. We then link the syntactic patterns associated with the units in FrameNet, VerbNet and WordNet, by unifying their representation and by matching the corresponding patterns at the level of syntactic groups. The alignment of the semantic components and their syntactic realisations is essential for the better exploitation of the abundance of information across resources, including shedding light on cross-resource similarities, discrepancies and inconsistencies. The syntactic patterns can facilitate the extraction of examples illustrating the use of verb synset literals in corpora and their semantic characterisation through the association of the syntactic groups with the components of semantic description (frame elements or semantic roles) and can be employed in various tasks requiring semantic and syntactic description. The resource is publicly available to the community. The components of the conceptual description are visualised showing the links to the original resources each component is drawn from.

2022

pdf bib
Linked Resources towards Enhancing the Conceptual Description of General Lexis Verbs Using Syntactic Information
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the Fifth International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

214–224

2021

pdf bib
Semantic Analysis of Verb-Noun Derivation in Princeton WordNet
Verginica Mititelu | Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the 11th Global Wordnet Conference

We present here the results of a morphosemantic analysis of the verb-noun pairs in the Princeton WordNet as reflected in the standoff file containing pairs annotated with a set of 14 semantic relations. We have automatically distinguished between zero-derivation and affixal derivation in the data and identified the affixes and manually checked the results. The data show that for each semantic relation an affix prevails in creating new words, although we cannot talk about their specificity with respect to such a relation. Moreover, certain pairs of verb-noun semantic primes are better represented for each semantic relation, and some semantic clusters (in the form of WordNet subtrees) take shape as a result. We thus employ a large-scale data-driven linguistically motivated analysis afforded by the rich derivational and morphosemantic description in WordNet to the end of capturing finer regularities in the process of derivation as represented in the semantic properties of the words involved and as reflected in the structure of the lexicon.

2020

pdf bib
It Takes Two to Tango – Towards a Multilingual MWE Resource
Svetlozara Leseva | Verginica Barbu Mititelu | Ivelina Stoyanova
Proceedings of the Fourth International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

Mature wordnets offer the opportunity of digging out interesting linguistic information otherwise not explicitly marked in the network. The focus in this paper is on the ways the results already obtained at two levels, derivation and multiword expressions, may be further employed. The parallel recent development of the two resources under discussion, the Bulgarian and the Romanian wordnets, has enabled interlingual analyses that reveal similarities and differences between the linguistic knowledge encoded in the two wordnets. In this paper we show how the resources developed and the knowledge gained are put together towards devising a linked MWE resource that is informed by layered dictionary representation and corpus annotation and analysis. This work is a proof of concept for the adopted method of compiling a multilingual MWE resource on the basis of information extracted from the Bulgarian, the Romanian and the Princeton wordnet, as well as additional language resources and automatic procedures.

pdf bib
Consistency Evaluation towards Enhancing the Conceptual Representation of Verbs in WordNet
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the Fourth International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

This paper outlines the process of enhancing the conceptual description of verb synsets in WordNet using FrameNet frames. On the one hand we expand the coverage of the mapping between WordNet and FrameNet, while on the other – we improve the quality of the mapping using a set of consistency checks and verification procedures. The procedures include an automatic identification of potential inconsistencies and imbalanced relations, as well as suggestions for a more precise frame assignment followed by manual validation. We perform an evaluation of the procedures in terms of the quality of the suggestions measured as the potential improvement in precision and coverage, the relevance of the result and the efficiency of the procedure.

2019

pdf bib
Enhancing Conceptual Description through Resource Linking and Exploration of Semantic Relations
Ivelina Stoyanova | Svetlozara Leseva
Proceedings of the 10th Global Wordnet Conference

The paper presents current efforts towards linking two large lexical semantic resources – WordNet and FrameNet – to the end of their mutual enrichment and the facilitation of the access, extraction and analysis of various types of semantic and syntactic information. In the second part of the paper, we go on to examine the relation of inheritance and other semantic relations as represented in WordNet and FrameNet and how they correspond to each other when the resources are aligned. We discuss the implications with respect to the enhancement of the two resources through the definition of new relations and the detailisation of conceptual frames.

pdf bib
Hear about Verbal Multiword Expressions in the Bulgarian and the Romanian Wordnets Straight from the Horse’s Mouth
Verginica Barbu Mititelu | Ivelina Stoyanova | Svetlozara Leseva | Maria Mitrofan | Tsvetana Dimitrova | Maria Todorova
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

In this paper we focus on verbal multiword expressions (VMWEs) in Bulgarian and Romanian as reflected in the wordnets of the two languages. The annotation of VMWEs relies on the classification defined within the PARSEME Cost Action. After outlining the properties of various types of VMWEs, a cross-language comparison is drawn, aimed to highlight the similarities and the differences between Bulgarian and Romanian with respect to the lexicalization and distribution of VMWEs. The contribution of this work is in outlining essential features of the description and classification of VMWEs and the cross-language comparison at the lexical level, which is essential for the understanding of the need for uniform annotation guidelines and a viable procedure for validation of the annotation.

pdf bib
Structural Approach to Enhancing WordNet with Conceptual Frame Semantics
Svetlozara Leseva | Ivelina Stoyanova
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper outlines procedures for enhancing WordNet with conceptual information from FrameNet. The mapping of the two resources is non-trivial. We define a number of techniques for the validation of the consistency of the mapping and the extension of its coverage which make use of the structure of both resources and the systematic relations between synsets in WordNet and between frames in FrameNet, as well as between synsets and frames). We present a case study on causativity, a relation which provides enhancement complementary to the one using hierarchical relations, by means of linking in a systematic way large parts of the lexicon. We show how consistency checks and denser relations may be implemented on the basis of this relation. We, then, propose new frames based on causative-inchoative correspondences and in conclusion touch on the possibilities for defining new frames based on the types of specialisation that takes place from parent to child synset.

2018

pdf bib
Classifying Verbs in WordNet by Harnessing Semantic Resources
Svetlozara Leseva | Ivelina Stoyanova | Maria Todorova
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)

This paper presents the principles and procedures involved in the construction of a classification of verbs using information from 3 semantic resources – WordNet, FrameNet and VerbNet. We adopt the FrameNet frames as the primary categories of the proposed classification and transfer them to WordNet synsets. The hierarchical relationships between the categories are projected both from the hypernymy relation in WordNet and from the hierarchy of some of the frame-to-frame relations in FrameNet. The semantic classes and their hierarchical organisation in WordNet are thus made explicit and allow for linguistic generalisations on the inheritance of semantic features and structures. We then select the beginners of the separate hierarchies and assign classification categories recursively to their hyponyms using a battery of procedures based on generalisations over the semantic primes and the hierarchical structure of WordNet and FrameNet and correspondences between VerbNet superclasses and FrameNet frames. The so-obtained suggestions are ranked according to probability. As a result, 13,465 out of 14,206 verb synsets are accommodated in the classification hierarchy at least through a general category, which provides a point of departure towards further refinement of categories. The resulting system of classification categories is initially derived from the WordNet hierarchy and is further validated against the hierarchy of frames within FrameNet. A set of procedures is established to address inconsistencies and heterogeneity of categories. The classification is subject to ongoing extensive manual verification, essential for ensuring the quality of the resource.

2016

pdf bib
Towards the Automatic Identification of Light Verb Constructions in Bulgarian
Ivelina Stoyanova | Svetlozara Leseva | Maria Todorova
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)

This paper presents work in progress focused on developing a method for automatic identification of light verb constructions (LVCs) as a subclass of Bulgarian verbal MWEs. The method is based on machine learning and is trained on a set of LVCs extracted from the Bulgarian WordNet (BulNet) and the Bulgarian National Corpus (BulNC). The machine learning uses lexical, morphosyntactic, syntactic and semantic features of LVCs. We trained and tested two separate classifiers using the Java package Weka and two learning decision tree algorithms – J48 and RandomTree. The evaluation of the method includes 10-fold cross-validation on the training data from BulNet (F1 = 0.766 obtained by the J48 decision tree algorithm and F1 = 0.725 by the RandomTree algorithm), as well as evaluation of the performance on new instances from the BulNC (F1 = 0.802 by J48 and F1 = 0.607 by the RandomTree algorithm). Preliminary filtering of the candidates gives a slight improvement (F1 = 0.802 by J48 and F1 = 0.737 by RandomTree).

pdf bib
Automatic Prediction of Morphosemantic Relations
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Tsvetana Dimitrova | Maria Todorova
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents a machine learning method for automatic identification and classification of morphosemantic relations (MSRs) between verb and noun synset pairs in the Bulgarian WordNet (BulNet). The core training data comprise 6,641 morphosemantically related verb–noun literal pairs from BulNet. The core dataset were preprocessed quality-wise by applying validation and reorganisation procedures. Further, the data were supplemented with negative examples of literal pairs not linked by an MSR. The designed supervised machine learning method uses the RandomTree algorithm and is implemented in Java with the Weka package. A set of experiments were performed to test various approaches to the task. Future work on improving the classifier includes adding more training data, employing more features, and fine-tuning. Apart from the language specific information about derivational processes, the proposed method is language independent.

2015

pdf bib
Automatic Classification of WordNet Morphosemantic Relations
Svetlozara Leseva | Ivelina Stoyanova | Maria Todorova | Tsvetana Dimitrova | Borislav Rizov | Svetla Koeva
The 5th Workshop on Balto-Slavic Natural Language Processing

2014

pdf bib
Automatic Semantic Filtering of Morphosemantic Relations in WordNet
Svetlozara Leseva | Ivelina Stoyanova | Borislav Rizov | Maria Todorova | Ekaterina Tarpomanova
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)

In this paper we present a method for automatic assignment of morphosemantic relations between derivationally related verb–noun pairs of synsets in the Bulgarian WordNet (BulNet) and for semantic filtering of those relations. The filtering process relies on the meaning of noun suffixes and the semantic compatibility of verb and noun taxonomic classes. We use the taxonomic labels assigned to all the synsets in the Princeton WordNet (PWN) – one label per synset – which denote their general semantic class. In the first iteration we employ the pairs <noun suffix : noun label> to filter out part of the relations. In the second iteration, which uses as input the output of the first one, we apply a stronger semantic filter. It makes use of the taxonomic labels of the noun-verb synset pairs observed for a given morphosemantic relation. In this way we manage to reliably filter out impossible or unlikely combinations. The results of the performed experiment may be applied to enrich BulNet with morphosemantic relations and new synsets semi-automatically, while facilitating the manual work and reducing its cost.

pdf bib
Noun-Verb Derivation in the Bulgarian and the Romanian WordNet – A Comparative Approach
Ekaterina Tarpomanova | Svetlozara Leseva | Maria Todorova | Tsvetana Dimitrova | Borislav Rizov | Verginica Barbu Mititelu | Elena Irimia
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)

Romanian and Bulgarian are Balkan languages with rich derivational morphology that, if introduced into their respective wordnets, can aid broadening of the wordnet content and the possible NLP applications. In this paper we present a joint work on introducing derivation into the Bulgarian and the Romanian WordNets, BulNet and RoWordNet, respectively, by identifying and subsequently labelling the derivationally and semantically related noun-verb pairs. Our research aims at providing a framework for a comparative study on derivation in the two languages and offering training material for the automatic identification and assignment of derivational and morphosemantic relations needed in various applications.

2013

pdf bib
Wordnet-Based Cross-Language Identification of Semantic Relations
Ivelina Stoyanova | Svetla Koeva | Svetlozara Leseva
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing

pdf bib
Text Modification for Bulgarian Sign Language Users
Slavina Lozanova | Ivelina Stoyanova | Svetlozara Leseva | Svetla Koeva | Boian Savtchev
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

2012

pdf bib
Application of Clause Alignment for Statistical Machine Translation
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Rositsa Dekova | Angel Genov | Borislav Rizov | Tsvetana Dimitrova | Ekaterina Tarpomanova | Hristina Kukova
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

2008

pdf bib
Chooser: a Multi-Task Annotation Tool
Svetla Koeva | Borislav Rizov | Svetlozara Leseva
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper presents a tool assisting manual annotation of linguistic data developed at the Department of Computational linguistics, IBL-BAS. Chooser is a general-purpose modular application for corpus annotation based on the principles of commonality and reusability of the created resources, language and theory independence, extendibility and user-friendliness. These features have been achieved through a powerful abstract architecture within the Model-View-Controller paradigm that is easily tailored to task-specific requirements and readily extendable to new applications. The tool is to a considerable extent independent of data format and representation and produces outputs that are largely consistent with existing standards. The annotated data are therefore reusable in tasks requiring different levels of annotation and are accessible to external applications. The tool incorporates edit functions, pass and arrangement strategies that facilitate annotators’ work. The relevant module produces tree-structured and graph-based representations in respective annotation modes. Another valuable feature of the application is concurrent access by multiple users and centralised storage of lexical resources underlying annotation schemata, as well as of annotations, including frequency of selection, updates in the lexical database, etc. Chooser has been successfully applied to a number of tasks: POS tagging, WS and syntactic annotation.