t-HNE: A Text-guided Hierarchical Noise Eliminator for Multimodal Sentiment Analysis

Zuocheng Li, Lishuang Li


Abstract
In the Multimodal Sentiment Analysis task, most existing approaches focus on extracting modality-consistent information from raw unimodal data and integrating it into multimodal representations for sentiment classification. However, these methods often assume that all modalities contribute equally to model performance, prioritizing the extraction and enhancement of consistent information, while overlooking the adverse effects of noise caused by modality inconsistency. In contrast to these approaches, this paper introduces a novel approach namely text-guided Hierarchical Noise Eliminator (t-HNE). This model consists of a two-stage denoising phase and a feature recovery phase. Firstly, textual information is injected into both visual and acoustic modalities using an attention mechanism, aiming to reduce intra-modality noise in the visual and acoustic representations. Secondly, it further mitigates inter-modality noise by maximizing the mutual information between textual representations and the respective visual and acoustic representations. Finally, to address the potential loss of modality-invariant information during denoising, the fused multimodal representation is refined through contrastive learning with each unimodal representation except the textual. Extensive experiments conducted on the CMU-MOSI and CMU-MOSEI datasets demonstrate the efficacy of our approach.
Anthology ID:
2025.coling-main.192
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2834–2844
Language:
URL:
https://aclanthology.org/2025.coling-main.192/
DOI:
Bibkey:
Cite (ACL):
Zuocheng Li and Lishuang Li. 2025. t-HNE: A Text-guided Hierarchical Noise Eliminator for Multimodal Sentiment Analysis. In Proceedings of the 31st International Conference on Computational Linguistics, pages 2834–2844, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
t-HNE: A Text-guided Hierarchical Noise Eliminator for Multimodal Sentiment Analysis (Li & Li, COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.192.pdf