Thierry Dutoit

2022

pdf bib abs
Analysis of Co-Laughter Gesture Relationship on RGB Videos in Dyadic Conversation Context
Hugo Bohy | Ahmad Hammoudeh | Antoine Maiorca | Stéphane Dupont | Thierry Dutoit
Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-span within the 13th Language Resources and Evaluation Conference

The development of virtual agents has enabled human-avatar interactions to become increasingly rich and varied. Moreover, an expressive virtual agent i.e. that mimics the natural expression of emotions, enhances social interaction between a user (human) and an agent (intelligent machine). The set of non-verbal behaviors of a virtual character is, therefore, an important component in the context of human-machine interaction. Laughter is not just an audio signal, but an intrinsic relationship of multimodal non-verbal communication, in addition to audio, it includes facial expressions and body movements. Motion analysis often relies on a relevant motion capture dataset, but the main issue is that the acquisition of such a dataset is expensive and time-consuming. This work studies the relationship between laughter and body movements in dyadic conversations between two interlocutors. The body movements were extracted from videos using deep learning based pose estimator model. We found that, in the explored NDC-ME dataset, a single statistical feature (i.e, the maximum value, or the maximum of Fourier transform) of a joint movement weakly correlates with laughter intensity by 30%. However, we did not find a direct correlation between audio features and body movements. We discuss about the challenges to use such dataset for the audio-driven co-laughter motion synthesis task.

pdf bib abs
Are There Any Body-movement Differences between Women and Men When They Laugh?
Ahmad Hammoudeh | Antoine Maiorca | Stéphane Dupont | Thierry Dutoit
Proceedings of the Workshop on Smiling and Laughter across Contexts and the Life-span within the 13th Language Resources and Evaluation Conference

Smiling differences between men and women have been studied in psychology. Women smile more than men although the expressiveness of women is not universally more across all facial actions. There are also body movement differences between women and men. For example, more open-body postures were reported for men, but are there any body-movement differences between men and women when they laugh? To investigate this question, we study body-movement signals extracted from recorded laughter videos using a deep learning pose estimation model. Initial results showed a higher Fourier Transform amplitude of thorax and shoulder movements for females while males had a higher Fourier transform amplitude of Elbow movement. The differences were not limited to a small frequency range but covered most of the frequency spectrum. However, further investigations are still needed.

2018

pdf bib abs
ASR-based Features for Emotion Recognition: A Transfer Learning Approach
Noé Tits | Kevin El Haddad | Thierry Dutoit
Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML)

During the last decade, the applications of signal processing have drastically improved with deep learning. However areas of affecting computing such as emotional speech synthesis or emotion recognition from spoken language remains challenging. In this paper, we investigate the use of a neural Automatic Speech Recognition (ASR) as a feature extractor for emotion recognition. We show that these features outperform the eGeMAPS feature set to predict the valence and arousal emotional dimensions, which means that the audio-to-text mapping learned by the ASR system contains information related to the emotional dimensions in spontaneous speech. We also examine the relationship between first layers (closer to speech) and last layers (closer to text) of the ASR and valence/arousal.

2016

pdf bib abs
AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Kevin El Haddad | Hüseyin Çakmak | Stéphane Dupont | Thierry Dutoit
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

It has been shown that adding expressivity and emotional expressions to an agent’s communication systems would improve the interaction quality between this agent and a human user. In this paper we present a multimodal database of affect bursts, which are very short non-verbal expressions with facial, vocal, and gestural components that are highly synchronized and triggered by an identifiable event. This database contains motion capture and audio data of affect bursts representing disgust, startle and surprise recorded at three different levels of arousal each. This database is to be used for synthesis purposes in order to generate affect bursts of these emotions on a continuous arousal level scale.

2014

pdf bib abs
The AV-LASYN Database : A synchronous corpus of audio and 3D facial marker data for audio-visual laughter synthesis
Hüseyin Çakmak | Jérôme Urbain | Thierry Dutoit | Joëlle Tilmanne
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

A synchronous database of acoustic and 3D facial marker data was built for audio-visual laughter synthesis. Since the aim is to use this database for HMM-based modeling and synthesis, the amount of collected data from one given subject had to be maximized. The corpus contains 251 utterances of laughter from one male participant. Laughter was elicited with the help of humorous videos. The resulting database is synchronous between modalities (audio and 3D facial motion capture data). Visual 3D data is available in common formats such as BVH and C3D with head motion and facial deformation independently available. Data is segmented and audio has been annotated. Phonetic transcriptions are available in the HTK-compatible format. Principal component analysis has been conducted on visual data and has shown that a dimensionality reduction might be relevant. The corpus may be obtained under a research license upon request to authors.

2010

This paper presents the large audiovisual laughter database recorded as part of the AVLaughterCycle project held during the eNTERFACE09 Workshop in Genova. 24 subjects participated. The freely available database includes audio signal and video recordings as well as facial motion tracking, thanks to markers placed on the subjects face. Annotations of the recordings, focusing on laughter description, are also provided and exhibited in this paper. In total, the corpus contains more than 1000 spontaneous laughs and 27 acted laughs. The laughter utterances are highly variable: the laughter duration ranges from 250ms to 82s and the sounds cover voiced vowels, breath-like expirations, hum-, hiccup- or grunt-like sounds, etc. However, as the subjects had no one to interact with, the database contains very few speech-laughs. Acted laughs tend to be longer than spontaneous ones and are more often composed of voiced vowels. The database can be useful for automatic laughter processing or cognitive science works. For the AVLaughterCycle project, it has served to animate a laughing virtual agent with an output laugh linked to the conversational partners input laugh.

2000

pdf bib
EULER: an Open, Generic, Multilingual and Multi-platform Text-to-Speech System
Thierry Dutoit | Michel Bagein | Fabrice Malfrère | Vincent Pagel | Alain Ruelle | Nawfal Tounsi | Dominique Wynsberghe
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)