Ivan Ubaleht


2025

pdf bib
Siberian Ingrian Finnish: FST and IGTs
Ivan Ubaleht
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages

This paper presents the current version of the finite-state transducer for the Siberian Ingrian Finnish. Our finite-state transducer uses two-level morphology. We use LexC and TwolC languages together with HFST tools to develop lexicons and phonological rules, as well as to compile the transducer. The paper also provides a description of the morphological system of Siberian Ingrian Finnish. In addition, we present a collection of interlinear glossed texts in Siberian Ingrian Finnish, provided in a machine-readable format.

2022

pdf bib
Development of the Siberian Ingrian Finnish Speech Corpus
Ivan Ubaleht | Taisto-Kalevi Raudalainen
Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages

In this paper we present the speech corpus for the Siberian Ingrian Finnish language. The speech corpus includes audio data, annotations, software tools for data-processing, two databases and a web application. We have published part of the audio data and annotations. The software tool for parsing annotation files and feeding a relational database is developed and published under a free license. A web application is developed and available. At this moment, about 300 words and 200 phrases can be displayed using this web application.