Mariana Romanyshyn


2024

pdf bib
Computational Analysis of Dehumanization of Ukrainians on Russian Social Media
Kateryna Burovova | Mariana Romanyshyn
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

Dehumanization is a pernicious process of denying some or all attributes of humanness to the target group. It is frequently cited as a common hallmark of incitement to commit genocide. The international security landscape has seen a dramatic shift following the 2022 Russian invasion of Ukraine. This, coupled with recent developments in the conceptualization of dehumanization, necessitates the creation of new techniques for analyzing and detecting this extreme violence-related phenomenon on a large scale. Our project pioneers the development of a detection system for instances of dehumanization. To achieve this, we collected the entire posting history of the most popular bloggers on Russian Telegram and tested classical machine learning, deep learning, and zero-shot learning approaches to explore and detect the dehumanizing rhetoric. We found that the transformer-based method for entity extraction SpERT shows a promising result of F 1 = 0.85 for binary classification. The proposed methods can be built into the systems of anticipatory governance, contribute to the collection of evidence of genocidal intent in the Russian invasion of Ukraine, and pave the way for large-scale studies of dehumanizing language. This paper contains references to language that some readers may find offensive.

2023

pdf bib
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Mariana Romanyshyn
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

pdf bib
Contextual Embeddings for Ukrainian: A Large Language Model Approach to Word Sense Disambiguation
Yurii Laba | Volodymyr Mudryi | Dmytro Chaplynskyi | Mariana Romanyshyn | Oles Dobosevych
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Ukrainian language based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on the dataset generated in an unsupervised way to obtain better contextual embeddings for words with multiple senses. The paper presents a method for generating a new dataset for WSD evaluation in the Ukrainian language based on the SUM dictionary. We developed a comprehensive framework that facilitates the generation of WSD evaluation datasets, enables the use of different prediction strategies, LLMs, and pooling strategies, and generates multiple performance reports. Our approach shows 77,9% accuracy for lexical meaning prediction for homonyms.

pdf bib
The UNLP 2023 Shared Task on Grammatical Error Correction for Ukrainian
Oleksiy Syvokon | Mariana Romanyshyn
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

This paper presents the results of the UNLP 2023 shared task, the first Shared Task on Grammatical Error Correction for the Ukrainian language. The task included two tracks: GEC-only and GEC+Fluency. The dataset and evaluation scripts were provided to the participants, and the final results were evaluated on a hidden test set. Six teams submitted their solutions before the deadline, and four teams submitted papers that were accepted to appear in the UNLP workshop proceedings and are referred to in this report. The CodaLab leaderboard is left open for further submissions.