Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis

Sandrine Brognaux, Thomas François, Marco Saerens


Abstract
Text-to-speech has long been centered on the production of an intelligible message of good quality. More recently, interest has shifted to the generation of more natural and expressive speech. A major issue of existing approaches is that they usually rely on a manual annotation in expressive styles, which tends to be rather subjective. A typical related issue is that the annotation is strongly influenced ― and possibly biased ― by the semantic content of the text (e.g. a shot or a fault may incite the annotator to tag that sequence as expressing a high degree of excitation, independently of its acoustic realization). This paper investigates the assumption that human annotation of basketball commentaries in excitation levels can be automatically improved on the basis of acoustic features. It presents two techniques for label correction exploiting a Gaussian mixture and a proportional-odds logistic regression. The automatically re-annotated corpus is then used to train HMM-based expressive speech synthesizers, the performance of which is assessed through subjective evaluations. The results indicate that the automatic correction of the annotation with Gaussian mixture helps to synthesize more contrasted excitation levels, while preserving naturalness.
Anthology ID:
L16-1613
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3872–3879
Language:
URL:
https://aclanthology.org/L16-1613
DOI:
Bibkey:
Cite (ACL):
Sandrine Brognaux, Thomas François, and Marco Saerens. 2016. Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3872–3879, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis (Brognaux et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1613.pdf