From Benchmark to Better Embeddings: Leveraging Synonym Substitution to Enhance Multimodal Models in Ukrainian

Volodymyr Mudryi, Yurii Laba


Abstract
We study the robustness of text–image retrieval for Ukrainian under synonym-substitution attacks (SSA). On Multi30K with OpenCLIP, we evaluate two SSA methods: dictionary-based and LLM-based, and find Ukrainian degrades far more than English (e.g., GPT-4o SSA drops HIT@1 from 32.1 10.9 vs. 41.6 30.4). We introduce a Hybrid method that filters dictionary candidates with an LLM to preserve sense and grammar, yielding higher-quality perturbations (Ukrainian HIT@1 16.8 vs. 7.6/10.9). To mitigate this problem, we propose synonym-augmented fine-tuning, injecting one-word substitutions into training; it boosts robustness (Hybrid 28.1, GPT-4o 25.1) without harming original performance. This is the first systematic SSA evaluation for Ukrainian multimodal retrieval and a practical recipe for improving models in low-resource, morphologically rich languages. We release code, prompts, and trained checkpoints at https://github.com/YuriiLaba/UA-B2BE.
Anthology ID:
2025.findings-emnlp.1115
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20458–20468
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1115/
DOI:
Bibkey:
Cite (ACL):
Volodymyr Mudryi and Yurii Laba. 2025. From Benchmark to Better Embeddings: Leveraging Synonym Substitution to Enhance Multimodal Models in Ukrainian. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 20458–20468, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
From Benchmark to Better Embeddings: Leveraging Synonym Substitution to Enhance Multimodal Models in Ukrainian (Mudryi & Laba, Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1115.pdf
Checklist:
 2025.findings-emnlp.1115.checklist.pdf