Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning

Fanfan Wang, Xiangqing Shen, Jianfei Yu, Rui Xia


Abstract
Emotional Support Conversation (ESC) systems aim to alleviate user distress. However, current Chain-of-Thought based ESC methods often employ rigid, text-only reasoning, limiting adaptability in dynamic, multimodal interactions and introducing reasoning noise that degrades support quality. To address this, we introduce “Flexible Thinking” for multimodal ESC, enabling models to adaptively select contextually relevant thinking aspects: Visual Scene, Emotion, Situation, and Response Strategy. We first construct training data by manually curating flexible thinking demonstrations on the MESC dataset, then using a Multimodal Large Language Model to synthesize these processes for the full training set. Then, we propose FIRES, a framework integrating Supervised Fine-Tuning (SFT) for initial learning with Reinforcement Learning for refinement. This two-stage approach helps FIRES transcend SFT’s generalization limits and, crucially, directly links thinking processes to response quality via tailored rewards, moving beyond imitating potentially imperfect synthetic data. Experiments on MESC and EMOTyDA datasets demonstrate FIRES’s effectiveness and generalizability in fostering higher-quality emotional support responses through adaptive reasoning.
Anthology ID:
2025.findings-emnlp.70
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1341–1356
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.70/
DOI:
Bibkey:
Cite (ACL):
Fanfan Wang, Xiangqing Shen, Jianfei Yu, and Rui Xia. 2025. Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 1341–1356, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.70.pdf
Checklist:
 2025.findings-emnlp.70.checklist.pdf