Milena Slavcheva


2025

2023

The paper presents a semantic model of protest events, called Semantic Interpretations of Protest Events (SemInPE). The analytical framework used for building the semantic representations is inspired by the object-oriented paradigm in computer science and a cognitive approach to the linguistic analysis. The model is a practical application of the Unified Eventity Representation (UER) formalism, which is based on the Unified Modeling Language (UML). The multi-layered architecture of the model provides flexible means for building the semantic representations of the language objects along a scale of generality and specificity. Thus, it is a suitable environment for creating the elements of ontologies on various topics and for different languages.

2022

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese & Subtask 4 English.

2021

The paper describes a system for automatic summarization in English language of online news data that come from different non-English languages. The system is designed to be used in production environment for media monitoring. Automatic summarization can be very helpful in this domain when applied as a helper tool for journalists so that they can review just the important information from the news channels. However, like every software solution, the automatic summarization needs performance monitoring and assured safe environment for the clients. In media monitoring environment the most problematic features to be addressed are: the copyright issues, the factual consistency, the style of the text and the ethical norms in journalism. Thus, the main contribution of our present work is that the above mentioned characteristics are successfully monitored in neural automatic summarization models and improved with the help of validation, fact-preserving and fact-checking procedures.

2013

2011

2009

2006

This paper presents a semantic classification of reflexive verbs in Bulgarian, augmenting the morphosyntactic classes of verbs in the large Bulgarian Lexical Data Base - a language resource utilized in a number of Language Engineering (LE) applications. Thesemantic descriptors conform to the Unified Eventity Representation (UER), developed by Andrea Schalley. The UER is a graphical formalism, introducing the object-oriented system design to linguistic semantics. Reflexive/non-reflexive verb pairs are analyzed where the non-reflexive member of the opposition, a two-place predicate, is considered the initial linguistic entity from which the reflexive correlate is derived. The reflexive verbs are distributed into initial syntactic-semantic classes which serve as the basis for defining the relevant semantic descriptors in the form of EVENTITY FRAME diagrams. The factors that influence the categorization of the reflexives are the lexical paradigmaticapproach to the data, the choice of only one reading for each verb, top level generalization of the semantic descriptors. The language models described in this paper provide the possibility for building linguistic components utilizable in knowledge-driven systems.

2004

2003

2002

1993