Jingjing Jiang


2024

pdf bib
Estimating the Emotional Valence of Interlocutors Using Heterogeneous Sensors in Human-Human Dialogue
Jingjing Jiang | Ao Guo | Ryuichiro Higashinaka
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Dialogue systems need to accurately understand the user’s mental state to generate appropriate responses, but accurately discerning such states solely from text or speech can be challenging. To determine which information is necessary, we first collected human-human multimodal dialogues using heterogeneous sensors, resulting in a dataset containing various types of information including speech, video, physiological signals, gaze, and body movement. Additionally, for each time step of the data, users provided subjective evaluations of their emotional valence while reviewing the dialogue videos. Using this dataset and focusing on physiological signals, we analyzed the relationship between the signals and the subjective evaluations through Granger causality analysis. We also investigated how sensor signals differ depending on the polarity of the valence. Our findings revealed several physiological signals related to the user’s emotional valence.

pdf bib
Towards a Real-Time Multimodal Emotion Estimation Model for Dialogue Systems
Jingjing Jiang
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems

This position paper presents my research interest in establishing human-like chat-oriented dialogue systems. To this end, my work focuses on two main areas: the construction and utilization of multimodal datasets and real-time multimodal affective computing. I discuss the limitations of current multimodal dialogue corpora and multimodal affective computing models. As a solution, I have constructed a human-human dialogue dataset containing various synchronized multimodal information, and I have conducted preliminary analyses on it. In future work, I will further analyze the collected data and build a real-time multimodal emotion estimation model for dialogue systems.