Nowadays portable devices such as smart phones can be used to capture the face of a user simultaneously with the voice input. Server based or even embedded dialogue system might utilize this additional information to detect whether the speaking user addresses the system or other parties or whether the listening user is focused on the display or not. Depending on these findings the dialogue system might change its strategy to interact with the user improving the overall communication between human and system. To develop and test methods for On/Off-Focus detection a multimodal corpus of user-machine interactions was recorded within the German SmartWeb project. The corpus comprises 99 recording sessions of a triad communication between the user, the system and a human companion. The user can address/watch/listen to the system but also talk to his companion, read from the display or simply talk to herself. Facial video is captured with a standard built-in video camera of a smart phone while voice input in being recorded by a high quality close microphone as well as over a realistic transmission line via Bluetooth and WCDMA. The resulting SmartWeb Video Corpus (SVC) can be obtained from the Bavarian Archive for Speech Signals.
SmartWeb UMTS Speech Data Collection: The SmartWeb Handheld Corpus
Hannes Mögele | Moritz Kaiser | Florian Schiel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper we outline the German speech data collection for the SmartWeb project, which is fundedby the German Ministry of Science and Education. We focus on the SmartWeb Handheld Corpus (SHC), which has been collected by the Bavarian Archive for Speech Signals (BAS) at the Phonetic Institute (IPSK) of Munich University. Signals of SHC are being recorded in real-life environments(indoor and outdoor) with real background noise as well as real transmission line errors.We developed a new elicitation method and recording technique, calledsituational prompting, which facilitates collecting realistic dialogue speech data in a cost efficient way.We can show that almost realistic speech queries to a dialogue system issued over a mobile PDA or smart phonecan be collected very efficiently using an automatic speech server.We describe the technical and linguistic features of the resulting speech corpus, which will bepublicly available at BAS or ELDA.
Three advanced German speech corpora have been collected during theGerman SmartWeb project. One of them, the SmartWeb MotorbikeCorpus (SMC) is described in this paper.As with all SmartWeb speech corpora SMC is designed for a dialogue system dealing with open domains.The corpus is recorded under the special circumstances of a motorbike ride and contains utterances of the driver related to information retrieval from various sources and different topics. Audio tracks show characteristic noise from the engine and surrounding traffic as well as drop outs caused by the transmission over Bluetooth and the UMTS mobile network. We discuss the problems of the technical setup and the fully automatic evocation of natural-spoken queries by means of dialogue-like sequences.