Cross-lingual Text Classification Transfer: The Case of Ukrainian

Daryna Dementieva; Valeriia Khylenko; Georg Groh

Cross-lingual Text Classification Transfer: The Case of Ukrainian

Daryna Dementieva, Valeriia Khylenko, Georg Groh

Abstract

Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. To support further fair development of NLP models, exploring the possibilities of effective knowledge transfer to new languages is crucial. Ukrainian, in particular, stands as a language that still can benefit from the continued refinement of cross-lingual methodologies. Due to our knowledge, there is a tremendous lack of Ukrainian corpora for typical text classification tasks, i.e., different types of style, or harmful speech, or texts relationships. However, the amount of resources required for such corpora collection from scratch is understandable. In this work, we leverage the state-of-the-art advances in NLP, exploring cross-lingual knowledge transfer methods avoiding manual data curation: large multilingual encoders and translation systems, LLMs, and language adapters. We test the approaches on three text classification tasks—toxicity classification, formality classification, and natural language inference (NLI)—providing the “recipe” for the optimal setups for each task.

Anthology ID:: 2025.coling-main.97
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1451–1464
Language:
URL:: https://aclanthology.org/2025.coling-main.97/
DOI:
Bibkey:
Cite (ACL):: Daryna Dementieva, Valeriia Khylenko, and Georg Groh. 2025. Cross-lingual Text Classification Transfer: The Case of Ukrainian. In Proceedings of the 31st International Conference on Computational Linguistics, pages 1451–1464, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Cross-lingual Text Classification Transfer: The Case of Ukrainian (Dementieva et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.97.pdf

PDF Cite Search Fix data