Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition

Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura


Abstract
Multimodal emotion recognition aims to recognize emotions for each utterance from multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter- and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared with all baselines. Code is released on Github (https://anonymous.4open.science/r/MERC-7F88).
Anthology ID:
2023.emnlp-main.996
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16051–16069
Language:
URL:
https://aclanthology.org/2023.emnlp-main.996
DOI:
10.18653/v1/2023.emnlp-main.996
Bibkey:
Cite (ACL):
Dongyuan Li, Yusong Wang, Kotaro Funakoshi, and Manabu Okumura. 2023. Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16051–16069, Singapore. Association for Computational Linguistics.
Cite (Informal):
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimoda Emotion Recognition (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.996.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.996.mp4