A Dual Contrastive Learning Framework for Enhanced Multimodal Conversational Emotion Recognition

Yunhe Xie, Chengjie Sun, Ziyi Cao, Bingquan Liu, Zhenzhou Ji, Yuanchao Liu, Lili Shan


Abstract
Multimodal Emotion Recognition in Conversations (MERC) identifies utterance emotions by integrating both contextual and multimodal information from dialogue videos. Existing methods struggle to capture emotion shifts due to label replication and fail to preserve positive independent modality contributions during fusion. To address these issues, we propose a Dual Contrastive Learning Framework (DCLF) that enhances current MERC models without additional data. Specifically, to mitigate label replication effects, we construct context-aware contrastive pairs. Additionally, we assign pseudo-labels to distinguish modality-specific contributions. DCLF works alongside basic models to introduce semantic constraints at the utterance, context, and modality levels. Our experiments on two MERC benchmark datasets demonstrate performance gains of 4.67%-4.98% on IEMOCAP and 5.52%-5.89% on MELD, outperforming state-of-the-art approaches. Perturbation tests further validate DCLF’s ability to reduce label dependence. Additionally, DCLF incorporates emotion-sensitive independent modality features and multimodal fusion representations into final decisions, unlocking the potential contributions of individual modalities.
Anthology ID:
2025.coling-main.272
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4055–4065
Language:
URL:
https://aclanthology.org/2025.coling-main.272/
DOI:
Bibkey:
Cite (ACL):
Yunhe Xie, Chengjie Sun, Ziyi Cao, Bingquan Liu, Zhenzhou Ji, Yuanchao Liu, and Lili Shan. 2025. A Dual Contrastive Learning Framework for Enhanced Multimodal Conversational Emotion Recognition. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4055–4065, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
A Dual Contrastive Learning Framework for Enhanced Multimodal Conversational Emotion Recognition (Xie et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.272.pdf