Using Audio Books for Training a Text-to-Speech System

Aimilios Chalamandaris, Pirros Tsiakoulis, Sotiris Karabetsos, Spyros Raptis


Abstract
Creating new voices for a TTS system often requires a costly procedure of designing and recording an audio corpus, a time consuming and effort intensive task. Using publicly available audiobooks as the raw material of a spoken corpus for such systems creates new perspectives regarding the possibility of creating new synthetic voices quickly and with limited effort. This paper addresses the issue of creating new synthetic voices based on audiobook data in an automated method. As an audiobook includes several types of speech, such as narration, character playing etc., special care is given in identifying the data subset that leads to a more neutral and general purpose synthetic voice. The main goal is to identify and address the effect the audiobook speech diversity on the resulting TTS system. Along with the methodology for coping with this diversity in the speech data, we also describe a set of experiments performed in order to investigate the efficiency of different approaches for automatic data pruning. Further plans for exploiting the diversity of the speech incorporated in an audiobook are also described in the final section and conclusions are drawn.
Anthology ID:
L14-1645
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3076–3080
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/838_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Aimilios Chalamandaris, Pirros Tsiakoulis, Sotiris Karabetsos, and Spyros Raptis. 2014. Using Audio Books for Training a Text-to-Speech System. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3076–3080, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Using Audio Books for Training a Text-to-Speech System (Chalamandaris et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/838_Paper.pdf