Tenggan Zhang

2024

pdf bib abs
ESCoT: Towards Interpretable Emotional Support Dialogue Systems
Tenggan Zhang | Xinjie Zhang | Jinming Zhao | Li Zhou | Qin Jin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Understanding the reason for emotional support response is crucial for establishing connections between users and emotional support dialogue systems. Previous works mostly focus on generating better responses but ignore interpretability, which is extremely important for constructing reliable dialogue systems. To empower the system with better interpretability, we propose an emotional support response generation scheme, named Emotion-Focused and Strategy-Driven Chain-of-Thought (ESCoT), mimicking the process of identifying, understanding, and regulating emotions. Specially, we construct a new dataset with ESCoT in two steps: (1) Dialogue Generation where we first generate diverse conversation situations, then enhance dialogue generation using richer emotional support strategies based on these situations; (2) Chain Supplement where we focus on supplementing selected dialogues with elements such as emotion, stimuli, appraisal, and strategy reason, forming the manually verified chains. Additionally, we further develop a model to generate dialogue responses with better interpretability. We also conduct extensive experiments and human evaluations to validate the effectiveness of the proposed ESCoT and generated dialogue responses. Our dataset, code, and model will be released.

2022

The emotional state of a speaker can be influenced by many different factors in dialogues, such as dialogue scene, dialogue topic, and interlocutor stimulus. The currently available data resources to support such multimodal affective analysis in dialogues are however limited in scale and diversity. In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M³ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances. M³ED is annotated with 7 emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral) at utterance level, and encompasses acoustic, visual, and textual modalities. To the best of our knowledge, M³ED is the first multimodal emotional dialogue dataset in Chinese.It is valuable for cross-culture emotion analysis and recognition. We apply several state-of-the-art methods on the M³ED dataset to verify the validity and quality of the dataset. We also propose a general Multimodal Dialogue-aware Interaction framework, MDI, to model the dialogue context for emotion recognition, which achieves comparable performance to the state-of-the-art methods on the M³ED. The full dataset and codes are available.

Co-authors

Xinchao Wang 1

Xinjie Zhang 1

Li Zhou 1

Venues

acl2

Fix data