Automatic identification of head movements in video-recorded conversations: can words help?

Patrizia Paggio, Costanza Navarretta, Bart Jongejan


Abstract
We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 68% with respect to these. The results also show that using jerk improves accuracy. We then conduct an investigation of the overlap between temporal sequences classified as either movement or non-movement and the speech stream of the person performing the gesture. The statistics derived from this analysis show that using word features may help increase the accuracy of the model.
Anthology ID:
W17-2006
Volume:
Proceedings of the Sixth Workshop on Vision and Language
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Anya Belz, Erkut Erdem, Katerina Pastra, Krystian Mikolajczyk
Venue:
VL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–42
Language:
URL:
https://aclanthology.org/W17-2006
DOI:
10.18653/v1/W17-2006
Bibkey:
Cite (ACL):
Patrizia Paggio, Costanza Navarretta, and Bart Jongejan. 2017. Automatic identification of head movements in video-recorded conversations: can words help?. In Proceedings of the Sixth Workshop on Vision and Language, pages 40–42, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Automatic identification of head movements in video-recorded conversations: can words help? (Paggio et al., VL 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2006.pdf