Olha Kanishcheva
2026
Ukrainian Multiword Expressions Corpus: Creation, Annotation, and Linguistic Analysis
Hanna Sytar | Maria Shvedova | Olha Kanishcheva
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Hanna Sytar | Maria Shvedova | Olha Kanishcheva
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper presents the development of a corpus of annotated multiword expressions (MWEs) for Ukrainian. The resource covers four major categories of MWEs: verbal, nominal, adjectival/adverbial, and functional. We describe the methodology used for data selection, the annotation scheme, and the procedures employed during annotation. In addition, the paper discusses some specific types of MWE constructions, illustrating their usage with numerous examples and addressing complex and borderline cases. The resulting corpus is an important resource for linguistic studies and NLP tasks involving MWEs, and is publicly accessible https://gitlab.com/parseme/sharedtask-data/-/tree/master/2.0?ref_type=heads.
2023
The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s
Olha Kanishcheva | Tetiana Kovalova | Maria Shvedova | Ruprecht von Waldenfels
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Olha Kanishcheva | Tetiana Kovalova | Maria Shvedova | Ruprecht von Waldenfels
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
We describe a Ukrainian-Russian code-switching corpus of Ukrainian Parliamentary Session Transcripts. The corpus includes speeches entirely in Ukrainian, Russian, or various types of mixed speech and allows us to see how speakers switch between these languages depending on the communicative situation. The paper describes the process of creating this corpus from the official multilingual transcripts using automatic language detecting and publicly available metadata on the speakers. On this basis, we consider possible reasons for the change in the number of Ukrainian speakers in the parliament and present the most common patterns of bilingual Ukrainian and Russian code-switching in parliamentarians’ speeches.