Zhouan Zhu
2025
From Traits to Empathy: Personality-Aware Multimodal Empathetic Response Generation
Jiaqiang Wu
|
Xuandong Huang
|
Zhouan Zhu
|
Shangfei Wang
Proceedings of the 31st International Conference on Computational Linguistics
Empathetic dialogue systems improve user experience across various domains. Existing approaches mainly focus on acquiring affective and cognitive knowledge from text, but neglect the unique personality traits of individuals and the inherently multimodal nature of human face-to-face conversation. To this end, we enhance the dialogue system with the ability to generate empathetic responses from a multimodal perspective, and consider the diverse personality traits of users. We incorporate multimodal data, such as images and texts, to understand the user’s emotional state and situation. Concretely, we first identify the user’s personality trait. Then, the dialogue system comprehends the user’s emotions and situation by the analysis of multimodal inputs. Finally, the response generator models the correlations among the personality, emotion, and multimodal data, to generate empathetic responses. Experiments on the MELD dataset and the MEDIC dataset validate the effectiveness of the proposed approach.
Integrating Visual Modalities with Large Language Models for Mental Health Support
Zhouan Zhu
|
Shangfei Wang
|
Yuxin Wang
|
Jiaqiang Wu
Proceedings of the 31st International Conference on Computational Linguistics
Current work of mental health support primarily utilizes unimodal textual data and often fails to understand and respond to users’ emotional states comprehensively. In this study, we introduce a novel framework that enhances Large Language Model (LLM) performance in mental health dialogue systems by integrating multimodal inputs. Our framework uses visual language models to analyze facial expressions and body movements, then combines these visual elements with dialogue context and counseling strategies. This approach allows LLMs to generate more nuanced and supportive responses. The framework comprises four components: in-context learning via computation of semantic similarity; extraction of facial expression descriptions through visual modality data; integration of external knowledge from a knowledge base; and delivery of strategic guidance through a strategy selection module. Both automatic and human evaluations confirm that our approach outperforms existing models, delivering more empathetic, coherent, and contextually relevant mental health support responses.