Exploring Word Sense Distribution in Ukrainian with a Semantic Vector Space Model
Nataliia Cheilytko | Ruprecht von Waldenfels
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

The paper discusses a Semantic Vector Space Model targeted at revealing how Ukrainian word senses vary and relate to each other. One of the benefits of the proposed semantic model is that it considers second-order context of the words and, thus, has more potential to compare and distinguish word senses observed in a unique concordance line. Combined with visualization techniques, this model makes it possible for a lexicographer to explore the Ukrainian word senses distribution on a large-scale. The paper describes the first results of the research performed and the following steps of the initiative.

The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s
Olha Kanishcheva | Tetiana Kovalova | Maria Shvedova | Ruprecht von Waldenfels
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)

We describe a Ukrainian-Russian code-switching corpus of Ukrainian Parliamentary Session Transcripts. The corpus includes speeches entirely in Ukrainian, Russian, or various types of mixed speech and allows us to see how speakers switch between these languages depending on the communicative situation. The paper describes the process of creating this corpus from the official multilingual transcripts using automatic language detecting and publicly available metadata on the speakers. On this basis, we consider possible reasons for the change in the number of Ukrainian speakers in the parliament and present the most common patterns of bilingual Ukrainian and Russian code-switching in parliamentarians’ speeches.


Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote
Noëmi Aepli | Ruprecht von Waldenfels | Tanja Samardžić
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects