Enhancing Emotion Recognition in Spoken Dialogue Systems through Multimodal Integration and Personalization

Takumasa Kaneko

Enhancing Emotion Recognition in Spoken Dialogue Systems through Multimodal Integration and Personalization

Abstract

My research interests focus on multimodal emotion recognition and personalization in emotion recognition tasks. In multimodal emotion recognition, existing studies demonstrate that integrating various data types like speech, text, and video enhances accuracy. However, real-time constraints and high dataset costs limit their practical application. I propose constructing a multimodal emotion recognition model by combining available unimodal datasets. In terms of personalization, traditional discrete emotion labels often fail to capture the complexity of human emotions. Although recent methods embed speaker characteristics to boost prediction accuracy, they require extensive retraining. I introduce continuous prompt tuning, which updates only the speaker prompts while keeping the speech encoder weights fixed, enabling the addition of new speaker data without additional retraining. This paper discusses these existing research gaps and presents novel approaches to address them, aiming to significantly improve emotion recognition in spoken dialogue systems.

Anthology ID:: 2024.yrrsds-1.2
Volume:: Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
Month:: September
Year:: 2024
Address:: Kyoto, Japan
Editors:: Koji Inoue, Yahui Fu, Agnes Axelsson, Atsumoto Ohashi, Brielen Madureira, Yuki Zenimoto, Biswesh Mohapatra, Armand Stricker, Sopan Khosla
Venues:: YRRSDS | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5–7
Language:
URL:: https://aclanthology.org/2024.yrrsds-1.2
DOI:
Bibkey:
Cite (ACL):: Takumasa Kaneko. 2024. Enhancing Emotion Recognition in Spoken Dialogue Systems through Multimodal Integration and Personalization. In Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems, pages 5–7, Kyoto, Japan. Association for Computational Linguistics.
Cite (Informal):: Enhancing Emotion Recognition in Spoken Dialogue Systems through Multimodal Integration and Personalization (Kaneko, YRRSDS-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.yrrsds-1.2.pdf

PDF Cite Search