Hansjörg Hofmann


2016

pdf bib
A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
Patricia Braunger | Hansjörg Hofmann | Steffen Werner | Maria Schmidt
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Recent spoken dialog systems have been able to recognize freely spoken user input in restricted domains thanks to statistical methods in the automatic speech recognition. These methods require a high number of natural language utterances to train the speech recognition engine and to assess the quality of the system. Since human speech offers many variants associated with a single intent, a high number of user utterances have to be elicited. Developers are therefore turning to crowdsourcing to collect this data. This paper compares three different methods to elicit multiple utterances for given semantics via crowd sourcing, namely with pictures, with text and with semantic entities. Specifically, we compare the methods with regard to the number of valid data and linguistic variance, whereby a quantitative and qualitative approach is proposed. In our study, the method with text led to a high variance in the utterances and a relatively low rate of invalid data.

2015

pdf bib
Evaluation of Crowdsourced User Input Data for Spoken Dialog Systems
Maria Schmidt | Markus Müller | Martin Wagner | Sebastian Stüker | Alex Waibel | Hansjörg Hofmann | Steffen Werner
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2013

pdf bib
Evaluation of Speech Dialog Strategies for Internet Applications in the Car
Hansjörg Hofmann | Ute Ehrlich | André Berton | Angela Mahr | Rafael Math | Christian Müller
Proceedings of the SIGDIAL 2013 Conference

2008

pdf bib
Emotion Recognition from Speech: Stress Experiment
Stefan Scherer | Hansjörg Hofmann | Malte Lampmann | Martin Pfeil | Steffen Rhinow | Friedhelm Schwenker | Günther Palm
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The goal of this work is to introduce an architecture to automatically detect the amount of stress in the speech signal close to real time. For this an experimental setup to record speech rich in vocabulary and containing different stress levels is presented. Additionally, an experiment explaining the labeling process with a thorough analysis of the labeled data is presented. Fifteen subjects were asked to play an air controller simulation that gradually induced more stress by becoming more difficult to control. During this game the subjects were asked to answer questions, which were then labeled by a different set of subjects in order to receive a subjective target value for each of the answers. A recurrent neural network was used to measure the amount of stress contained in the utterances after training. The neural network estimated the amount of stress at a frequency of 25 Hz and outperformed the human baseline.