We investigate the influence of audiovisual features on the perception of speaking style and performance of politicians, utilizing a large publicly available dataset of German parliament recordings. We conduct a human perception experiment involving eye-tracker data to evaluate human ratings as well as behavior in two separate conditions, i.e. audiovisual and video only. The ratings are evaluated on a five dimensional scale comprising measures of insecurity, monotony, expressiveness, persuasiveness, and overall performance. Further, they are statistically analyzed and put into context in a multimodal feature analysis, involving measures of prosody, voice quality and motion energy. The analysis reveals several statistically significant features, such as pause timing, voice quality measures and motion energy, that highly positively or negatively correlate with certain human ratings of speaking style. Additionally, we compare the gaze behavior of the human subjects to evaluate saliency regions in the multimodal and visual only conditions. The eye-tracking analysis reveals significant changes in the gaze behavior of the human subjects; participants reduce their focus of attention in the audiovisual condition mainly to the region of the face of the politician and scan the upper body, including hands and arms, in the video only condition.
This paper presents the evaluation of the PIT Corpus of multi-party dialogues recorded in a Wizard-of-Oz environment. An evaluation has been performed with two different foci: First, a usability evaluation was used to take a look at the overall ratings of the system. A shortened version of the SASSI questionnaire, namely the SASSISV, and the well established AttrakDiff questionnaire assessing the hedonistic and pragmatic dimension of computer systems have been analysed. In a second evaluation, the user's gaze direction was analysed in order to assess the difference in the user's (gazing) behaviour if interacting with the computer versus the other dialogue partner. Recordings have been performed in different setups of the system, e.g. with and without avatar. Thus, the presented evaluation further focuses on the difference in the interaction caused by deploying an avatar. The quantitative analysis of the gazing behaviour has resulted in several encouraging significant differences. As a possible interpretation it could be argued that users are more attentive towards systems with an avatar - the difference a face makes.