Mining Native Ukrainian Paraphrases: A Multi-Source Comparison

Vladyslav Fesenko; Hanna Dydyk-Meush; Volodymyr Mudryi

Mining Native Ukrainian Paraphrases: A Multi-Source Comparison

Vladyslav Fesenko, Hanna Dydyk-Meush, Volodymyr Mudryi

Abstract

We introduce a Ukrainian paraphrase dataset mined from event-aligned news headlines and compare it with translated and LLM-generated data sources. Candidate pairs are retrieved from native Ukrainian news titles and filtered using semantic and lexical constraints to form a training corpus in a semi-automatic pipeline. Human evaluation indicates that the sources differ in useful ways: LLM-generated paraphrases are generally stronger in meaning preservation, whereas news-mined pairs offer greater lexical variation while remaining fluent and meaning-preserving. We tune mT5-large and mT0-large and evaluate them on several held-out test sets, including a human-validated subset. Relative to Spivavtor-large, the models achieve comparable semantic preservation with lower copying on the combined and human-validated sets. Overall, the findings highlight the value of naturally mined Ukrainian paraphrases as supervision for low-resource paraphrase generation.

Anthology ID:: 2026.unlp-1.17
Volume:: Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Month:: May
Year:: 2026
Address:: Lviv, Ukraine
Editor:: Mariana Romanyshyn
Venue:: UNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 199–208
Language:
URL:: https://aclanthology.org/2026.unlp-1.17/
DOI:
Bibkey:
Cite (ACL):: Vladyslav Fesenko, Hanna Dydyk-Meush, and Volodymyr Mudryi. 2026. Mining Native Ukrainian Paraphrases: A Multi-Source Comparison. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), pages 199–208, Lviv, Ukraine. Association for Computational Linguistics.
Cite (Informal):: Mining Native Ukrainian Paraphrases: A Multi-Source Comparison (Fesenko et al., UNLP 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.unlp-1.17.pdf

PDF Cite Search Fix data