Self-Supervised Unimodal Label Generation Strategy Using Recalibrated Modality Representations for Multimodal Sentiment Analysis
Yewon Hwang | Jong-Hwan Kim
Findings of the Association for Computational Linguistics: EACL 2023
While multimodal sentiment analysis (MSA) has gained much attention over the last few years, the main focus of most work on MSA has been limited to constructing multimodal representations that capture interactions between different modalities in a single task. This was largely due to a lack of unimodal annotations in MSA benchmark datasets. However, training a model using only multimodal representations can lead to suboptimal performance due to insufficient learning of each uni-modal representation. In this work, to fully optimize learning representations from multimodal data, we propose SUGRM which jointly trains multimodal and unimodal tasks using recalibrated features. The features are recalibrated such that the model learns to weight the features differently based on the features of other modalities. Further, to leverage unimodal tasks, we auto-generate unimodal annotations via a unimodal label generation module (ULGM). The experiment results on two benchmark datasets demonstrate the efficacy of our framework.