Development and Evaluation of Speech Synthesis Corpora for Latvian

Roberts Darģis, Peteris Paikens, Normunds Gruzitis, Ilze Auzina, Agate Akmane


Abstract
Text to speech (TTS) systems are necessary for all languages to ensure accessibility and availability of digital language services. Recent advances in neural speech synthesis have eText to speech (TTS) systems are necessary for any language to ensure accessibility and availability of digital language services. Recent advances in neural speech synthesis have enabled the development of such systems with a data-driven approach that does not require significant development of language-specific tools. However, smaller languages often lack speech corpora that would be sufficient for training current neural TTS models, which require at least 30 hours of good quality audio recordings from a single speaker in a noiseless environment with matching transcriptions. Making such a corpus manually can be cost prohibitive. This paper presents an unsupervised approach to obtain a suitable corpus from unannotated recordings using automated speech recognition for transcription, as well as automated speaker segmentation and identification. The proposed method and software tools are applied and evaluated on a case study for developing a corpus suitable for Latvian speech synthesis based on Latvian public radio archive data.nabled the development of such systems with a data-driven approach that does not require much language-specific tool development. However, smaller languages often lack speech corpora that would be sufficient for training current neural TTS models, which require approximately 30 hours of good quality audio recordings from a single speaker in a noiseless environment with matching transcriptions. Making such a corpus manually can be cost prohibitive. This paper presents an unsupervised approach to obtain a suitable corpus from unannotated recordings using automated speech recognition for transcription, as well as automated speaker segmentation and identification. The proposed methods and software tools are applied and evaluated on a case study for developing a corpus suitable for Latvian speech synthesis based on Latvian public radio archive data.
Anthology ID:
2020.lrec-1.818
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6633–6637
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.818
DOI:
Bibkey:
Cite (ACL):
Roberts Darģis, Peteris Paikens, Normunds Gruzitis, Ilze Auzina, and Agate Akmane. 2020. Development and Evaluation of Speech Synthesis Corpora for Latvian. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6633–6637, Marseille, France. European Language Resources Association.
Cite (Informal):
Development and Evaluation of Speech Synthesis Corpora for Latvian (Darģis et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.818.pdf