The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s

Olha Kanishcheva, Tetiana Kovalova, Maria Shvedova, Ruprecht von Waldenfels


Abstract
We describe a Ukrainian-Russian code-switching corpus of Ukrainian Parliamentary Session Transcripts. The corpus includes speeches entirely in Ukrainian, Russian, or various types of mixed speech and allows us to see how speakers switch between these languages depending on the communicative situation. The paper describes the process of creating this corpus from the official multilingual transcripts using automatic language detecting and publicly available metadata on the speakers. On this basis, we consider possible reasons for the change in the number of Ukrainian speakers in the parliament and present the most common patterns of bilingual Ukrainian and Russian code-switching in parliamentarians’ speeches.
Anthology ID:
2023.unlp-1.10
Volume:
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editor:
Mariana Romanyshyn
Venue:
UNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–90
Language:
URL:
https://aclanthology.org/2023.unlp-1.10
DOI:
10.18653/v1/2023.unlp-1.10
Bibkey:
Cite (ACL):
Olha Kanishcheva, Tetiana Kovalova, Maria Shvedova, and Ruprecht von Waldenfels. 2023. The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 79–90, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s (Kanishcheva et al., UNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.unlp-1.10.pdf
Video:
 https://aclanthology.org/2023.unlp-1.10.mp4