TTS voice corpus reduction for audio-book generation

Meysam Shamsi


Abstract
Nowadays, with emerging new voice corpora, voice corpus reduction in expressive TTS becomes more important. In this study a spitting greedy approach is investigated to remove utterances. In the first step by comparing five objective measures, the TTS global cost has been found as the best available metric for approximation of perceptual quality. The greedy algorithm employs this measure to evaluate the candidates in each step and the synthetic quality resulted by its solution. It turned out that reducing voice corpus size until a certain length (1 hour in our experiment) could not degrade the synthetic quality. By modifying the original greedy algorithm, its computation time is reduced to a reasonable duration. Two perceptual tests have been run to compare this greedy method and the random strategy for voice corpus reduction. They revealed that there is no superiority of using the proposed greedy approach for corpus reduction.
Anthology ID:
2020.jeptalnrecital-recital.15
Volume:
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 3 : Rencontre des Étudiants Chercheurs en Informatique pour le TAL
Month:
6
Year:
2020
Address:
Nancy, France
Editors:
Christophe Benzitoun, Chloé Braud, Laurine Huber, David Langlois, Slim Ouni, Sylvain Pogodalla, Stéphane Schneider
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA et AFCP
Note:
Pages:
193–204
Language:
URL:
https://aclanthology.org/2020.jeptalnrecital-recital.15
DOI:
Bibkey:
Cite (ACL):
Meysam Shamsi. 2020. TTS voice corpus reduction for audio-book generation. In Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 3 : Rencontre des Étudiants Chercheurs en Informatique pour le TAL, pages 193–204, Nancy, France. ATALA et AFCP.
Cite (Informal):
TTS voice corpus reduction for audio-book generation (Shamsi, JEP/TALN/RECITAL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.jeptalnrecital-recital.15.pdf