CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection

Zhen Li, Bing Xu, Conghui Zhu, Tiejun Zhao


Abstract
Compared with unimodal data, multimodal data can provide more features to help the model analyze the sentiment of data. Previous research works rarely consider token-level feature fusion, and few works explore learning the common features related to sentiment in multimodal data to help the model fuse multimodal features. In this paper, we propose a Contrastive Learning and Multi-Layer Fusion (CLMLF) method for multimodal sentiment detection. Specifically, we first encode text and image to obtain hidden representations, and then use a multi-layer fusion module to align and fuse the token-level features of text and image. In addition to the sentiment analysis task, we also designed two contrastive learning tasks, label based contrastive learning and data based contrastive learning tasks, which will help the model learn common features related to sentiment in multimodal data. Extensive experiments conducted on three publicly available multimodal datasets demonstrate the effectiveness of our approach for multimodal sentiment detection compared with existing methods. The codes are available for use at https: //github.com/Link-Li/CLMLF
Anthology ID:
2022.findings-naacl.175
Volume:
Findings of the Association for Computational Linguistics: NAACL 2022
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2282–2294
Language:
URL:
https://aclanthology.org/2022.findings-naacl.175
DOI:
10.18653/v1/2022.findings-naacl.175
Bibkey:
Cite (ACL):
Zhen Li, Bing Xu, Conghui Zhu, and Tiejun Zhao. 2022. CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2282–2294, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection (Li et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-naacl.175.pdf
Video:
 https://aclanthology.org/2022.findings-naacl.175.mp4
Code
 link-li/clmlf