Ivan Šimko


2020

pdf bib
Digital Edition of the Life of St. Petka
Ivan Šimko
Proceedings of the Fourth International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

This paper presents the construction of a digital edition of multiple versions of the hagiography of St. Petka of Tarnovo. Two related versions are uploaded at first: a Church Slavonic print edition and its later damaskini redaction. Both texts are adapted for user-friendly reading with side-by-side facsimiles. Translations and additional data concerning separate tokens and sentences can be shown up by the cursor on fly. Further metadata will be available for search. Annotation has been adapted for the transitionary status of the language of the texts: it allows us to compare similar morphological forms with various functions. The edition has already been published online and can be used for both teaching and studying. The texts have been digitalized as a part of a larger project concerning the development of the Balkan areal features.

2019

pdf bib
Corpora and Processing Tools for Non-standard Contemporary and Diachronic Balkan Slavic
Teodora Vukovic | Nora Muheim | Olivier Winistörfer | Ivan Šimko | Anastasia Makarova | Sanja Bradjan
Proceedings of the Student Research Workshop Associated with RANLP 2019

The paper describes three corpora of different varieties of BS that are currently being developed with the goal of providing data for the analysis of the diatopic and diachronic variation in non-standard Balkan Slavic. The corpora includes spoken materials from Torlak, Macedonian dialects, as well as the manuscripts of pre-standardized Bulgarian. Apart from the texts, tools for PoS annotation and lemmatization for all varieties are being created, as well as syntactic parsing for Torlak and Bulgarian varieties. The corpora are built using a unified methodology, relying on the pest practices and state-of-the-art methods from the field. The uniform methodology allows the contrastive analysis of the data from different varieties. The corpora under construction can be considered a crucial contribution to the linguistic research on the languages in the Balkans as they provide the lacking data needed for the studies of linguistic variation in the Balkan Slavic, and enable the comparison of the said varieties with other neighbouring languages.