Integrating Visual Modalities with Large Language Models for Mental Health Support

Zhouan Zhu; Shangfei Wang; Yuxin Wang; Jiaqiang Wu

Integrating Visual Modalities with Large Language Models for Mental Health Support

Zhouan Zhu, Shangfei Wang, Yuxin Wang, Jiaqiang Wu

Abstract

Current work of mental health support primarily utilizes unimodal textual data and often fails to understand and respond to users’ emotional states comprehensively. In this study, we introduce a novel framework that enhances Large Language Model (LLM) performance in mental health dialogue systems by integrating multimodal inputs. Our framework uses visual language models to analyze facial expressions and body movements, then combines these visual elements with dialogue context and counseling strategies. This approach allows LLMs to generate more nuanced and supportive responses. The framework comprises four components: in-context learning via computation of semantic similarity; extraction of facial expression descriptions through visual modality data; integration of external knowledge from a knowledge base; and delivery of strategic guidance through a strategy selection module. Both automatic and human evaluations confirm that our approach outperforms existing models, delivering more empathetic, coherent, and contextually relevant mental health support responses.

Anthology ID:: 2025.coling-main.599
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8939–8954
Language:
URL:: https://aclanthology.org/2025.coling-main.599/
DOI:
Bibkey:
Cite (ACL):: Zhouan Zhu, Shangfei Wang, Yuxin Wang, and Jiaqiang Wu. 2025. Integrating Visual Modalities with Large Language Models for Mental Health Support. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8939–8954, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Integrating Visual Modalities with Large Language Models for Mental Health Support (Zhu et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.599.pdf

PDF Cite Search Fix data