2006
pdf
bib
abs
A joint intelligibility evaluation of French text-to-speech synthesis systems: the EvaSy SUS/ACR campaign
Philippe Boula de Mareüil
|
Christophe d’Alessandro
|
Alexander Raake
|
Gérard Bailly
|
Marie-Neige Garcia
|
Michel Morel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The EVALDA/EvaSy project is dedicated to the evaluation of text-to-speech synthesis systems for the French language. It is subdivided into four components: evaluation of the grapheme-to-phoneme conversion module (Boula de Mareüil et al., 2005), evaluation of prosody (Garcia et al., 2006), evaluation of intelligibility, and global evaluation of the quality of the synthesised speech. This paper reports on the key results of the intelligibility and global evaluation of the synthesised speech. It focuses on intelligibility, assessed on the basis of semantically unpredictable sentences, but a comparison with absolute category rating in terms of e.g. pleasantness and naturalness is also provided. Three diphone systems and three selection systems have been evaluated. It turns out that the most intelligible system (diphone-based) is far from being the one which obtains the best mean opinion score.
pdf
bib
abs
TC-STAR:Specifications of Language Resources and Evaluation for Speech Synthesis
A. Bonafonte
|
H. Höge
|
I. Kiss
|
A. Moreno
|
U. Ziegenhain
|
H. van den Heuvel
|
H.-U. Hain
|
X. S. Wang
|
M. N. Garcia
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In the framework of the EU funded project TC-STAR (Technology and Corpora for Speech to Speech Translation),research on TTS aims on providing a synthesized voice sounding like the source speaker speaking the target language. To progress in this direction, research is focused on naturalness, intelligibility, expressivity and voice conversion both, in the TC-STAR framework. For this purpose, specifications on large, high quality TTS databases have been developed and the data have been recorded for UK English, Spanish and Mandarin. The development of speech technology in TC-STAR is evaluation driven. Assessment of speech synthesis is needed to determine how well a system or technique performs in comparison to previous versions as well as other approaches (systems & methods). Apart from testing the whole system, all components of the system will be evaluated separately. This approach grants better assesment of each component as well as identification of the best techniques in the different speech synthesisprocesses.This paper describes the specifications of Language Resources for speech synthesis and the specifications for evaluation of speech synthesis activities.
pdf
bib
abs
Evaluation of multimodal components within CHIL: The evaluation packages and results
Djamel Mostefa
|
Marie-Neige Garcia
|
Khalid Choukri
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This article describes the first CHIL evaluation campaign in which 12 technologies were evaluated. The major outcomes of the first evaluation campaign are the so-called Evaluation Packages. An evaluation package is the full documentation (definition and description of the evaluation methodologies, protocols and metrics) alongside the data sets and software scoring tools, which an organisation needs in order to perform the evaluation of one or more systems for a given technology. These evaluation packages will be made available to the community through ELDA General Catalogue.
pdf
bib
abs
A joint prosody evaluation of French text-to-speech synthesis systems
Marie-Neige Garcia
|
Christophe d’Alessandro
|
Gérard Bailly
|
Philippe Boula de Mareüil
|
Michel Morel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper reports on prosodic evaluation in the framework of the EVALDA/EvaSy project for text-to-speech (TTS) evaluation for the French language. Prosody is evaluated using a prosodic transplantation paradigm. Intonation contours generated by the synthesis systems are transplanted on a common segmental content. Both diphone based synthesis and natural speech are used. Five TTS systems are tested along with natural voice. The test is a paired preference test (with 19 subjects), using 7 sentences. The results indicate that natural speech obtains consistently the first rank (with an average preference rate of 80%), followed by a selection based system (72%) and a diphone based system (58%). However, rather large variations in judgements are observed among subjects and sentences, and in some cases synthetic speech is preferred to natural speech. These results show the remarkable improvement achieved by the best selection based synthesis systems in terms of prosody. In this way; a new paradigm for evaluation of the prosodic component of TTS systems has been successfully demonstrated.