Neural Text-to-Speech Synthesis for Võro

Liisa Rätsep, Mark Fishel


Abstract
This paper presents the first high-quality neural text-to-speech (TTS) system for Võro, a minority language spoken in Southern Estonia. By leveraging existing Estonian TTS models and datasets, we analyze whether common low-resource NLP techniques, such as cross-lingual transfer learning from related languages or multi-task learning, can benefit our low-resource use case. Our results show that we can achieve high-quality Võro TTS without transfer learning and that using more diverse training data can even decrease synthesis quality. While these techniques may still be useful in some cases, our work highlights the need for caution when applied in specific low-resource scenarios, and it can provide valuable insights for future low-resource research and efforts in preserving minority languages.
Anthology ID:
2023.nodalida-1.73
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
723–727
Language:
URL:
https://aclanthology.org/2023.nodalida-1.73
DOI:
Bibkey:
Cite (ACL):
Liisa Rätsep and Mark Fishel. 2023. Neural Text-to-Speech Synthesis for Võro. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 723–727, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Neural Text-to-Speech Synthesis for Võro (Rätsep & Fishel, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.73.pdf