Michael Rießler

2025

Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages
Mika Hämäläinen | Michael Rießler | Eiaki V. Morooka | Lev Kharlashkin
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages

2024

pdf bib abs

Kola Saami Christian Text Corpus
Michael Rießler
Proceedings of the 9th International Workshop on Computational Linguistics for Uralic Languages

Christian texts have been known to be printed in Kola Saami languages since 1828; the most extensive publication is the Gospel of Matthew, different translations of which have been published three times since 1878, most recently in 2022. The Lord’s Prayer was translated in several more versions in Kildin Saami and Skolt Saami, first in 1828. All of these texts seem to go back to translations from Rus- sian. Such characteristics make these pub- lications just right for parallel text align- ment. This paper describes ongoing work with building a Kola Saami Christian Text Cor- pus, including conceptional and technical decisions. Thus, it describes a resource, rather than a study. However, compu- tational studies based on these data will hopefully take place in the near future, af- ter the Kildin Saami subset of this corpus is finished and published by the end of 2024. In addition to computation, this resource will also allow for comparative linguistic studies on diachronic and synchronic vari- ation and change in Kola Saami languages, which are among the most endangered and least described Uralic languages.

2021

pdf bib

The Relevance of the Source Language in Transfer Learning for ASR
Nils Hjortnaes | Niko Partanen | Michael Rießler | Francis M. Tyers
Proceedings of the 4th Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

pdf bib

Proceedings of the Seventh International Workshop on Computational Linguistics of Uralic Languages
Flammie A Pirinen | Timofey Arhangelskiy | Trond Trosterud | Michael Rießler
Proceedings of the Seventh International Workshop on Computational Linguistics of Uralic Languages

2020

pdf bib

Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages
Tommi A. Pirinen | Francis M. Tyers | Michael Rießler
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages

pdf bib

A pseudonymisation method for language documentation corpora: An experiment with spoken Komi
Rogier Blokland | Niko Partanen | Michael Rießler
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages

pdf bib

Towards a Speech Recognizer for Komi, an Endangered and Low-Resource Uralic Language
Nils Hjortnaes | Niko Partanen | Michael Rießler | Francis M. Tyers
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages

pdf bib abs

Improving the Language Model for Low-Resource ASR with Online Text Corpora
Nils Hjortnaes | Timofey Arkhangelskiy | Niko Partanen | Michael Rießler | Francis Tyers
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

In this paper, we expand on previous work on automatic speech recognition in a low-resource scenario typical of data collected by field linguists. We train DeepSpeech models on 35 hours of dialectal Komi speech recordings and correct the output using language models constructed from various sources. Previous experiments showed that transfer learning using DeepSpeech can improve the accuracy of a speech recognizer for Komi, though the error rate remained very high. In this paper we present further experiments with language models created using KenLM from text materials available online. These are constructed from two corpora, one containing literary texts, one for social media content, and another combining the two. We then trained the model using each language model to explore the impact of the language model data source on the speech recognition model. Our results show significant improvements of over 25% in character error rate and nearly 20% in word error rate. This offers important methodological insight into how ASR results can be improved under low-resource conditions: transfer learning can be used to compensate the lack of training data in the target language, and online texts are a very useful resource when developing language models in this context.

2019

pdf bib

An OCR system for the Unified Northern Alphabet
Niko Partanen | Michael Rießler
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

2018

pdf bib

Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages
Tommi A. Pirinen | Michael Rießler | Jack Rueter | Trond Trosterud | Francis M. Tyers
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf bib

Dependency Parsing of Code-Switching Data with Cross-Lingual Feature Representations
Niko Partanen | Kyungtae Lim | Michael Rießler | Thierry Poibeau
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf bib abs

The First Komi-Zyrian Universal Dependencies Treebanks
Niko Partanen | Rogier Blokland | KyungTae Lim | Thierry Poibeau | Michael Rießler
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

Two Komi-Zyrian treebanks were included in the Universal Dependencies 2.2 release. This article contextualizes the treebanks, discusses the process through which they were created, and outlines the future plans and timeline for the next improvements. Special attention is paid to the possibilities of using UD in the documentation and description of endangered languages.