Valts Ernštreits

2025

Digitization Work at the Finno-Ugrian Society: Livonian Case Study
Niko Partanen | Jack Rueter | Valts Ernštreits
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages

This article discusses the recent digitization project of the Finno-Ugrian Society, using the work on Livonian publications, especially those from Seppo Suhonen’s Liivin kielen näytteitä from 1975 as a case study. We start by contextualization and motivation for these undertakings, both from the point of view of the Finno-Ugrian Society and the University of Latvia Livonian Institute, and then describe the workflows we have developed and foresee for the next steps.

2024

pdf bib abs

Towards the speech recognition for Livonian
Valts Ernštreits
Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages

This article outlines the path toward the development of speech synthesis and speech recognition technologies for Livonian, a critically endangered Uralic language with around 20 contemporary fluent speakers. It presents the rationale behind the creation of these technologies and introduces the hypotheses and planned approaches to achieve this goal. The article discusses the four-stage approach of leveraging existing data and multiplying voice data through speech synthesis and voice cloning to generate the necessary data for building and training speech recognition for Livonian.

2022

pdf bib abs

Machine Translation for Livonian: Catering to 20 Speakers
Matīss Rikters | Marili Tomingas | Tuuli Tuisk | Valts Ernštreits | Mark Fishel
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Livonian is one of the most endangered languages in Europe with just a tiny handful of speakers and virtually no publicly available corpora. In this paper we tackle the task of developing neural machine translation (NMT) between Livonian and English, with a two-fold aim: on one hand, preserving the language and on the other – enabling access to Livonian folklore, lifestories and other textual intangible heritage as well as making it easier to create further parallel corpora. We rely on Livonian’s linguistic similarity to Estonian and Latvian and collect parallel and monolingual data for the four languages for translation experiments. We combine different low-resource NMT techniques like zero-shot translation, cross-lingual transfer and synthetic data creation to reach the highest possible translation quality as well as to find which base languages are empirically more helpful for transfer to Livonian. The resulting NMT systems and the collected monolingual and parallel data, including a manually translated and verified translation benchmark, are publicly released via OPUS and Huggingface repositories.

2020

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.

2019

pdf bib

Electronical resources for Livonian
Valts Ernštreits
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

Venues

Fix author