Volodymyr Mudryi


2025

pdf bib
From Benchmark to Better Embeddings: Leveraging Synonym Substitution to Enhance Multimodal Models in Ukrainian
Volodymyr Mudryi | Yurii Laba
Findings of the Association for Computational Linguistics: EMNLP 2025

We study the robustness of text–image retrieval for Ukrainian under synonym-substitution attacks (SSA). On Multi30K with OpenCLIP, we evaluate two SSA methods: dictionary-based and LLM-based, and find Ukrainian degrades far more than English (e.g., GPT-4o SSA drops HIT@1 from 32.1 10.9 vs. 41.6 30.4). We introduce a Hybrid method that filters dictionary candidates with an LLM to preserve sense and grammar, yielding higher-quality perturbations (Ukrainian HIT@1 16.8 vs. 7.6/10.9). To mitigate this problem, we propose synonym-augmented fine-tuning, injecting one-word substitutions into training; it boosts robustness (Hybrid 28.1, GPT-4o 25.1) without harming original performance. This is the first systematic SSA evaluation for Ukrainian multimodal retrieval and a practical recipe for improving models in low-resource, morphologically rich languages. We release code, prompts, and trained checkpoints at https://github.com/YuriiLaba/UA-B2BE.

pdf bib
Precision vs. Perturbation: Robustness Analysis of Synonym Attacks in Ukrainian NLP
Volodymyr Mudryi | Oleksii Ignatenko
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)

Synonym-based adversarial tests reveal fragile word patterns that accuracy metrics overlook, while virtually no such diagnostics exist for Ukrainian, a morphologically rich and low‐resource language. We present the first systematic robustness evaluation under synonym substitution in Ukrainian. Adapting TextFooler and BERT‐Attack to Ukrainian, we (i) adjust a 15000‐entry synonym dictionary to match proper word forms; (ii) integrate similarity filters; (iii) adapt masked‐LM search so it generates only valid inflected words. Across three text classification datasets (reviews, news headlines, social‐media manipulation) and three transformer models (Ukr‐RoBERTa, XLM‐RoBERTa, SBERT), single‐word swaps reduce accuracy by up to 12.6, while multi‐step attacks degrade performance by as much as 40.27 with around 112 model queries. A few‐shot transfer test shows GPT‐4o, a state‐of‐the‐art multilingual LLM, still suffers 6.9–15.0 drops on the same adversarial samples. Our results underscore the need for sense‐aware, morphology‐constrained synonym resources and provide a reproducible benchmark for future robustness research in Ukrainian NLP.

2023

pdf bib
Contextual Embeddings for Ukrainian: A Large Language Model Approach to Word Sense Disambiguation
Yurii Laba | Volodymyr Mudryi | Dmytro Chaplynskyi | Mariana Romanyshyn | Oles Dobosevych
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Ukrainian language based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on the dataset generated in an unsupervised way to obtain better contextual embeddings for words with multiple senses. The paper presents a method for generating a new dataset for WSD evaluation in the Ukrainian language based on the SUM dictionary. We developed a comprehensive framework that facilitates the generation of WSD evaluation datasets, enables the use of different prediction strategies, LLMs, and pooling strategies, and generates multiple performance reports. Our approach shows 77,9% accuracy for lexical meaning prediction for homonyms.