Tsvetana Dimitrova

2024

The paper reports on the first steps in developing a time-stamped multimodal dataset of reading data by Bulgarian children. Data are being collected, structured and analysed by means of ReadLet, an innovative infrastructure for multimodal language data collection that uses a tablet as a reader’s front-end. The overall goal of the project is to quantitatively analyse the reading skills of a sample of early Bulgarian readers collected over a two-year period, and compare them with the reading data of early readers of Italian, collected using the same protocol. We illustrate design issues of the experimental protocol, as well as the data acquisition process and the post-processing phase of data annotation/augmentation. To evaluate the potential and usefulness of the Bulgarian dataset for reading research, we present some preliminary statistical analyses of our recently collected data. They show robust convergence trends between Bulgarian and Italian early reading development stages.

pdf bib abs

Multilingual Corpus of Illustrative Examples on Activity Predicates
Ivelina Stoyanova | Hristina Kukova | Maria Todorova | Tsvetana Dimitrova
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

The paper presents the ongoing process of compilation of a multilingual corpus of illustrative examples to supplement our work on the syntactic and semantic analysis of predicates representing activities in Bulgarian and other languages. The corpus aims to include over 1,000 illustrative examples on verbs from six semantic classes of predicates (verbs of motion, contact, consumption, creation, competition and bodily functions) which provide a basis for observations on the specificity of their realisation. The corpus of illustrative examples will be used for contrastive studies and further elaboration on the scope and behaviour of activity verbs in general, as well as its semantic subclasses.

pdf bib abs

Unified Annotation of the Stages of the Bulgarian Language. First Steps
Fabio Maion | Tsvetana Dimitrova | Andrej Bojadziev
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)

The paper reports on an ongoing work on a proposal of guidelines for unified annotation of the stages in the development of the Bulgarian language from the Middle Ages to the early modern period. It discusses the criteria for the selection of texts and their representation, along with some results of the trial tagging with an existing tagger which was already trained on other texts.

2020

pdf bib abs

On WordNet Semantic Classes: Is the Sum Always Bigger?
Tsvetana Dimitrova
Proceedings of the Fourth International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

The paper offers an approach to the validation of the data resulted from a previous effort on expansion of WordNet noun semantic classes by mapping them with the semantic types within the Corpus Pattern Analysis (CPA) ontology employed by the framework of the Pattern Dictionary of English Verbs (PDEV). A case study is presented along with a set of conditions to be checked when validating the combined data.

2019

pdf bib abs

On Hidden Semantic Relations between Nouns in WordNet
Tsvetana Dimitrova | Valentina Stefanova
Proceedings of the 10th Global Wordnet Conference

The paper presents an effort on transferability of noun–verb and noun–adjective derivative and semantic relations to noun-noun relations. The approach relies on information from semantic classes and existing inter-POS derivative and (morpho)semantic relations between noun and verb, and noun and adjective synsets. We have added semantic relations between nouns in WordNet that are indirectly linked via verbs and adjectives. Observations on the combination between the relations and semantic classes of nouns they link, may facilitate further efforts in assigning semantic properties to nouns pointing to their abilities to participate in predicate-argument structures.

pdf bib abs

Hear about Verbal Multiword Expressions in the Bulgarian and the Romanian Wordnets Straight from the Horse’s Mouth
Verginica Barbu Mititelu | Ivelina Stoyanova | Svetlozara Leseva | Maria Mitrofan | Tsvetana Dimitrova | Maria Todorova
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

In this paper we focus on verbal multiword expressions (VMWEs) in Bulgarian and Romanian as reflected in the wordnets of the two languages. The annotation of VMWEs relies on the classification defined within the PARSEME Cost Action. After outlining the properties of various types of VMWEs, a cross-language comparison is drawn, aimed to highlight the similarities and the differences between Bulgarian and Romanian with respect to the lexicalization and distribution of VMWEs. The contribution of this work is in outlining essential features of the description and classification of VMWEs and the cross-language comparison at the lexical level, which is essential for the understanding of the need for uniform annotation guidelines and a viable procedure for validation of the annotation.

2018

pdf bib abs

Online Editor for WordNets
Borislav Rizov | Tsvetana Dimitrova
Proceedings of the Third International Conference on Computational Linguistics in Bulgaria (CLIB 2018)

The paper presents an online editor for lexical-semantic databases with relational structure similar to the structure of WordNet – Hydra for Web. It supports functionalities for editing of relational data (including query, creation, change, and linking of relational objects), simultaneous access of multiple user profiles, parallel data visualization and editing of the data on top of single- and parallel mode visualization of the language data.

2016

pdf bib abs

Hydra for Web: A Browser for Easy Access to Wordnets
Borislav Rizov | Tsvetana Dimitrova
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents a web interface for wordnets named Hydra for Web which is built on top of Hydra – an open source tool for wordnet development – by means of modern web technologies. It is a Single Page Application with simple but powerful and convenient GUI. It has two modes for visualisation of the language correspondences of searched (and found) wordnet synsets – single and parallel modes. Hydra for web is available at: http://dcl.bas.bg/bulnet/.

pdf bib abs

Automatic Prediction of Morphosemantic Relations
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Tsvetana Dimitrova | Maria Todorova
Proceedings of the 8th Global WordNet Conference (GWC)

This paper presents a machine learning method for automatic identification and classification of morphosemantic relations (MSRs) between verb and noun synset pairs in the Bulgarian WordNet (BulNet). The core training data comprise 6,641 morphosemantically related verb–noun literal pairs from BulNet. The core dataset were preprocessed quality-wise by applying validation and reorganisation procedures. Further, the data were supplemented with negative examples of literal pairs not linked by an MSR. The designed supervised machine learning method uses the RandomTree algorithm and is implemented in Java with the Weka package. A set of experiments were performed to test various approaches to the task. Future work on improving the classifier includes adding more training data, employing more features, and fine-tuning. Apart from the language specific information about derivational processes, the proposed method is language independent.

2015

pdf bib

2014

pdf bib

Coping with Derivation in the Bulgarian Wordnet
Tsvetana Dimitrova | Ekaterina Tarpomanova | Borislav Rizov
Proceedings of the Seventh Global Wordnet Conference

pdf bib abs

Noun-Verb Derivation in the Bulgarian and the Romanian WordNet – A Comparative Approach
Ekaterina Tarpomanova | Svetlozara Leseva | Maria Todorova | Tsvetana Dimitrova | Borislav Rizov | Verginica Barbu Mititelu | Elena Irimia
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)

Romanian and Bulgarian are Balkan languages with rich derivational morphology that, if introduced into their respective wordnets, can aid broadening of the wordnet content and the possible NLP applications. In this paper we present a joint work on introducing derivation into the Bulgarian and the Romanian WordNets, BulNet and RoWordNet, respectively, by identifying and subsequently labelling the derivationally and semantically related noun-verb pairs. Our research aims at providing a framework for a comparative study on derivation in the two languages and offering training material for the automatic identification and assignment of derivational and morphosemantic relations needed in various applications.

pdf bib abs

Historical Corpora of Bulgarian Language and Second Position Markers
Tsvetana Dimitrova | Andrej Boyadzhiev
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)

This paper demonstrates how historical corpora can be used in researching language phenomena. We exemplify the advantages and disadvantages through exploring three of the available corpora that contain textual sources of Old and Middle Bulgarian language to shed light on some aspects of the development of two words of ambiguous class. We discuss their behaviour to outline certain conditions for diachronic change they have undergone. The three corpora are accessible online (and offline – for downloading search results, xml files, etc.).