Wiebke Johannsen


2010

pdf bib
A Database of Age and Gender Annotated Telephone Speech
Felix Burkhardt | Martin Eckert | Wiebke Johannsen | Joachim Stegmann
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This article describes an age-annotated database of German telephone speech. All in all 47 hours of prompted and free text was recorded, uttered by 954 paid participants in a style typical for automated voice services. The participants were selected based on an equal distribution of males and females within four age cluster groups; children, youth, adults and seniors. Within the children, gender is not distinguished, because it doesn’t have a strong enough effect on the voice. The textual content was designed to be typical for automated voice services and consists mainly of short commands, single words and numbers. An additional database consists of 659 speakers (368 female and 291 male) that called an automated voice portal server and answered freely on one of the two questions “What is your favourite dish?” and “What would you take to an island?” (island set, 422 speakers). This data might be used for out-of domain testing. The data will be used to tune an age-detecting automated voice service and might be released to research institutes under controlled conditions as part of an open age and gender detection challenge.