Daniel Luzzati


pdf bib
Human annotation of ASR error regions: Is “gravity” a sharable concept for human annotators?
Daniel Luzzati | Cyril Grouin | Ioana Vasilescu | Martine Adda-Decker | Eric Bilinski | Nathalie Camelin | Juliette Kahn | Carole Lailler | Lori Lamel | Sophie Rosset
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper is concerned with human assessments of the severity of errors in ASR outputs. We did not design any guidelines so that each annotator involved in the study could consider the “seriousness” of an ASR error using their own scientific background. Eight human annotators were involved in an annotation task on three distinct corpora, one of the corpora being annotated twice, hiding this annotation in duplicate to the annotators. None of the computed results (inter-annotator agreement, edit distance, majority annotation) allow any strong correlation between the considered criteria and the level of seriousness to be shown, which underlines the difficulty for a human to determine whether a ASR error is serious or not.


pdf bib
Manual vs Assisted Transcription of Prepared and Spontaneous Speech
Thierry Bazillon | Yannick Estève | Daniel Luzzati
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Our paper focuses on the gain which can be achieved on human transcription of spontaneous and prepared speech, by using the assistance of an ASR system. This experiment has shown interesting results, first about the duration of the transcription task itself: even with the combination of prepared speech + ASR, an experimented annotator needs approximately 4 hours to transcribe 1 hours of audio data. Then, using an ASR system is mostly time-saving, although this gain is much more significant on prepared speech: assisted transcriptions are up to 4 times faster than manual ones. This ratio falls to 2 with spontaneous speech, because of ASR limits for these data. Detailed results reveal interesting correlations between the transcription task and phenomena such as Word Error Rate, telephonic or non-native speech turns, the number of fillers or propers nouns. The latter make spelling correction very time-consuming with prepared speech because of their frequency. As a consequence, watching for low averages of proper nouns may be a way to detect spontaneous speech.