Liang Dandan
2024
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Haiquan Zhao
|
Lingyu Li
|
Shisong Chen
|
Shuqi Kong
|
Jiaan Wang
|
Kexin Huang
|
Tianle Gu
|
Yixu Wang
|
Jian Wang
|
Liang Dandan
|
Zhixu Li
|
Yan Teng
|
Yanghua Xiao
|
Yingchun Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (e.g., ChatGPT) and ESC-oriented LLMs (e.g., ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4.
Search
Co-authors
- Haiquan Zhao 1
- Lingyu Li 1
- Shisong Chen 1
- Shuqi Kong 1
- Jiaan Wang 1
- show all...