Liisa Rätsep
2023
Neural Text-to-Speech Synthesis for Võro
Liisa Rätsep
|
Mark Fishel
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
This paper presents the first high-quality neural text-to-speech (TTS) system for Võro, a minority language spoken in Southern Estonia. By leveraging existing Estonian TTS models and datasets, we analyze whether common low-resource NLP techniques, such as cross-lingual transfer learning from related languages or multi-task learning, can benefit our low-resource use case. Our results show that we can achieve high-quality Võro TTS without transfer learning and that using more diverse training data can even decrease synthesis quality. While these techniques may still be useful in some cases, our work highlights the need for caution when applied in specific low-resource scenarios, and it can provide valuable insights for future low-resource research and efforts in preserving minority languages.
2022
MTee: Open Machine Translation Platform for Estonian Government
Toms Bergmanis
|
Marcis Pinnis
|
Roberts Rozis
|
Jānis Šlapiņš
|
Valters Šics
|
Berta Bernāne
|
Guntars Pužulis
|
Endijs Titomers
|
Andre Tättar
|
Taido Purason
|
Hele-Andra Kuulmets
|
Agnes Luhtaru
|
Liisa Rätsep
|
Maali Tars
|
Annika Laumets-Tättar
|
Mark Fishel
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
We present the MTee project - a research initiative funded via an Estonian public procurement to develop machine translation technology that is open-source and free of charge. The MTee project delivered an open-source platform serving state-of-the-art machine translation systems supporting four domains for six language pairs translating from Estonian into English, German, and Russian and vice-versa. The platform also features grammatical error correction and speech translation for Estonian and allows for formatted document translation and automatic domain detection. The software, data and training workflows for machine translation engines are all made publicly available for further use and research.