Veronika Solopova


2024

pdf bib
Features and Detectability of German Texts Generated with Large Language Models
Verena Irrgang | Veronika Solopova | Steffen Zeiler | Robert M. Nickel | Dorothea Kolossa
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

pdf bib
Check News in One Click: NLP-Empowered Pro-Kremlin Propaganda Detection
Veronika Solopova | Viktoriia Herman | Christoph Benzmüller | Tim Landgraf
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Many European citizens become targets of the Kremlin propaganda campaigns, aiming to minimise public support for Ukraine, foster a climate of mistrust and disunity, and shape elections (Meister, 2022). To address this challenge, we developed “Check News in 1 Click”, the first NLP-empowered pro-Kremlin propaganda detection application available in 7 languages, which provides the lay user with feedback on their news, and explains manipulative linguistic features and keywords. We conducted a user study, analysed user entries and models’ behaviour paired with questionnaire answers, and investigated the advantages and disadvantages of the proposed interpretative solution.

2023

pdf bib
The Evolution of Pro-Kremlin Propaganda From a Machine Learning and Linguistics Perspective
Veronika Solopova | Christoph Benzmüller | Tim Landgraf
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

In the Russo-Ukrainian war, propaganda is produced by Russian state-run news outlets for both international and domestic audiences. Its content and form evolve and change with time as the war continues. This constitutes a challenge to content moderation tools based on machine learning when the data used for training and the current news start to differ significantly. In this follow-up study, we evaluate our previous BERT and SVM models that classify Pro-Kremlin propaganda from a Pro-Western stance, trained on the data from news articles and telegram posts at the start of 2022, on the new 2023 subset. We examine both classifiers’ errors and perform a comparative analysis of these subsets to investigate which changes in narratives provoke drops in performance.

2021

pdf bib
A German Corpus of Reflective Sentences
Veronika Solopova | Oana-Iuliana Popescu | Margarita Chikobava | Ralf Romeike | Tim Landgraf | Christoph Benzmüller
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Reflection about a learning process is beneficial to students in higher education (Bub-nys, 2019). The importance of machine understanding of reflective texts grows as applications supporting students become more widespread. Nevertheless, due to the sensitive content, there is no public corpus available yet for the classification of text reflectiveness. We provide the first open-access corpus of reflective student essays in German. We collected essays from three different disciplines (Software Development, Ethics of Artificial Intelligence, and Teacher Training). We annotated the corpus at sentence level with binary reflective/non-reflective labels, using an iterative annotation process with linguistic and didactic specialists, mapping the reflective components found in the data to existing schemes and complementing them. We propose and evaluate linguistic features of reflectiveness and analyse their distribution within the resulted sentences according to their labels. Our contribution constitutes the first open-access corpus to help the community towards a unified approach for reflection detection.

2020

pdf bib
Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş | Veronika Solopova | Annalena Kohnert | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020

The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.