Linda Wiechetek


pdf bib
Morphological Disambiguation of South Sámi with FSTs and Neural Networks
Mika Hämäläinen | Linda Wiechetek
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North Sámi UD Treebank and some synthetically generated South Sámi data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North Sámi training data for South Sámi without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South Sámi, which makes it usable and applicable in the contexts of any other endangered language as well.


pdf bib
Is this the end? Two-step tokenization of sentence boundaries
Linda Wiechetek | Sjur Nørstebø Moshagen | Thomas Omma
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

pdf bib
Seeing more than whitespace — Tokenisation and disambiguation in a North Sámi grammar checker
Linda Wiechetek | Sjur Nørstebø Moshagen | Kevin Brubeck Unhammer
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)


pdf bib
Reusing Grammatical Resources for New Languages
Lene Antonsen | Trond Trosterud | Linda Wiechetek
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Grammatical approaches to language technology are often considered less optimal than statistical approaches in multilingual settings, where large-scale portability becomes an important issue. The present paper argues that there is a notable gain in reusing grammatical resources when porting technology to new languages. The pivot language is North Sámi, and the paper discusses portability with respect to the closely related Lule and South Sámi, and to the unrelated Faroese and Greenlandic languages.


pdf bib
Developing Prototypes for Machine Translation between Two Sami Languages
Francis M. Tyers | Linda Wiechetek | Trond Trosterud
Proceedings of the 13th Annual conference of the European Association for Machine Translation