Integrating Physiological, Speech, and Textual Information Toward Real-Time Recognition of Emotional Valence in Dialogue

Jingjing Jiang, Ao Guo, Ryuichiro Higashinaka


Abstract
Accurately estimating users’ emotional states in real time is crucial for enabling dialogue systems to respond adaptively. While existing approaches primarily rely on verbal information, such as text and speech, these modalities are often unavailable in non-speaking situations. In such cases, non-verbal information, particularly physiological signals, becomes essential for understanding users’ emotional states. In this study, we aimed to develop a model for real-time recognition of users’ binary emotional valence (high-valence vs. low-valence) during conversations. Specifically, we utilized an existing Japanese multimodal dialogue dataset, which includes various physiological signals, namely electrodermal activity (EDA), blood volume pulse (BVP), photoplethysmography (PPG), and pupil diameter, along with speech and textual data. We classify the emotional valence of every 15-second segment of dialogue interaction by integrating such multimodal inputs. To this end, time-series embeddings of physiological signals are extracted using a self-supervised encoder, while speech and textual features are obtained from pre-trained Japanese HuBERT and BERT models, respectively. The modality-specific embeddings are integrated using a feature fusion mechanism for emotional valence recognition. Experimental results show that while each modality individually contributes to emotion recognition, the inclusion of physiological signals leads to a notable performance improvement, particularly in non-speaking or minimally verbal situations. These findings underscore the importance of physiological information for enhancing real-time valence recognition in dialogue systems, especially when verbal information is limited.
Anthology ID:
2025.sigdial-1.47
Volume:
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:
August
Year:
2025
Address:
Avignon, France
Editors:
Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:
SIGDIAL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
591–600
Language:
URL:
https://aclanthology.org/2025.sigdial-1.47/
DOI:
Bibkey:
Cite (ACL):
Jingjing Jiang, Ao Guo, and Ryuichiro Higashinaka. 2025. Integrating Physiological, Speech, and Textual Information Toward Real-Time Recognition of Emotional Valence in Dialogue. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 591–600, Avignon, France. Association for Computational Linguistics.
Cite (Informal):
Integrating Physiological, Speech, and Textual Information Toward Real-Time Recognition of Emotional Valence in Dialogue (Jiang et al., SIGDIAL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.sigdial-1.47.pdf