Takumasa Kaneko
2024
Enhancing Emotion Recognition in Spoken Dialogue Systems through Multimodal Integration and Personalization
Takumasa Kaneko
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
My research interests focus on multimodal emotion recognition and personalization in emotion recognition tasks. In multimodal emotion recognition, existing studies demonstrate that integrating various data types like speech, text, and video enhances accuracy. However, real-time constraints and high dataset costs limit their practical application. I propose constructing a multimodal emotion recognition model by combining available unimodal datasets. In terms of personalization, traditional discrete emotion labels often fail to capture the complexity of human emotions. Although recent methods embed speaker characteristics to boost prediction accuracy, they require extensive retraining. I introduce continuous prompt tuning, which updates only the speaker prompts while keeping the speech encoder weights fixed, enabling the addition of new speaker data without additional retraining. This paper discusses these existing research gaps and presents novel approaches to address them, aiming to significantly improve emotion recognition in spoken dialogue systems.
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information
Yoshiki Tanaka
|
Takumasa Kaneko
|
Hiroki Onozeki
|
Natsumi Ezure
|
Ryuichi Uehara
|
Zhiyang Qi
|
Tomoya Higuchi
|
Ryutaro Asahara
|
Michimasa Inaba
Proceedings of the 2nd International AIWolfDial Workshop
The Werewolf Game is a communication game where players’ reasoning and discussion skills are essential. In this study, we present a Werewolf AI agent developed for the AIWolfDial 2024 shared task, co-hosted with the 17th INLG. In recent years, large language models like ChatGPT have garnered attention for their exceptional response generation and reasoning capabilities. We thus develop the LLM-based agents for the Werewolf Game. This study aims to enhance the consistency of the agent’s utterances by utilizing dialogue summaries generated by LLMs and manually designed personas and utterance examples. By analyzing self-match game logs, we demonstrate that the agent’s utterances are contextually consistent and that the character, including tone, is maintained throughout the game.
Search
Co-authors
- Yoshiki Tanaka 1
- Hiroki Onozeki 1
- Natsumi Ezure 1
- Ryuichi Uehara 1
- Zhiyang Qi 1
- show all...