Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation

Jingjing Jiang


Abstract
This position paper outlines my research vision for developing human-like dialogue systems capable of both perceiving and expressing emotions through multimodal communication. My current research focuses on two main areas: multimodal emotion recognition and non-verbal cue generation. For emotion recognition, I constructed a Japanese multimodal dialogue dataset that captures natural, dyadic face-to-face interactions and developed an emotional valence recognition model that integrates textual, speech and physiological inputs. On the generation side, my research explores non-verbal cue generation for embodied conversational agents (ECAs). Finally, the paper discusses the future of SDSs, emphasizing the shift from traditional turn-based architectures to full-duplex, real-time, multimodal systems.
Anthology ID:
2025.yrrsds-1.6
Volume:
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Month:
August
Year:
2025
Address:
Avignon, France
Editors:
Ryan Whetten, Virgile Sucal, Anh Ngo, Kranti Chalamalasetti, Koji Inoue, Gaetano Cimino, Zachary Yang, Yuki Zenimoto, Ricardo Rodriguez
Venue:
YRRSDS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–17
Language:
URL:
https://aclanthology.org/2025.yrrsds-1.6/
DOI:
Bibkey:
Cite (ACL):
Jingjing Jiang. 2025. Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation. In Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems, pages 15–17, Avignon, France. Association for Computational Linguistics.
Cite (Informal):
Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation (Jiang, YRRSDS 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.yrrsds-1.6.pdf