MPID: A Modality-Preserving and Interaction-Driven Fusion Network for Multimodal Sentiment Analysis

Tianyi Li; Daming Liu

MPID: A Modality-Preserving and Interaction-Driven Fusion Network for Multimodal Sentiment Analysis

Abstract

The advancement of social media has intensified interest in the research direction of Multimodal Sentiment Analysis (MSA). However, current methodologies exhibit relative limitations, particularly in their fusion mechanisms that overlook nuanced differences and similarities across modalities, leading to potential biases in MSA. In addition, indiscriminate fusion across modalities can introduce unnecessary complexity and noise, undermining the effectiveness of the analysis. In this essay, a Modal-Preserving and Interaction-Driven Fusion Network is introduced to address the aforementioned challenges. The compressed representations of each modality are initially obtained through a Token Refinement Module. Subsequently, we employ a Dual Perception Fusion Module to integrate text with audio and a separate Adaptive Graded Fusion Module for text and visual data. The final step leverages text representation to enhance composite representation. Our experiments on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets demonstrate that our model achieves state-of-the-art performance.

Anthology ID:: 2025.coling-main.291
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4313–4322
Language:
URL:: https://aclanthology.org/2025.coling-main.291/
DOI:
Bibkey:
Cite (ACL):: Tianyi Li and Daming Liu. 2025. MPID: A Modality-Preserving and Interaction-Driven Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4313–4322, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: MPID: A Modality-Preserving and Interaction-Driven Fusion Network for Multimodal Sentiment Analysis (Li & Liu, COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.291.pdf

PDF Cite Search Fix data