Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Yan Ling; Jianfei Yu; Rui Xia

doi:10.18653/v1/2022.acl-long.152

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Abstract

As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention inrecent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodalalignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify fine-grainedaspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-LanguagePre-training framework for MABSA (VLP-MABSA), which is a unified multimodal encoder-decoder architecture for all the pretrainingand downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodalmodalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pre-training task. The source code is publicly released at https://github.com/NUSTM/VLP-MABSA.

Anthology ID:: 2022.acl-long.152
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2149–2159
Language:
URL:: https://aclanthology.org/2022.acl-long.152/
DOI:: 10.18653/v1/2022.acl-long.152
Bibkey:
Cite (ACL):: Yan Ling, Jianfei Yu, and Rui Xia. 2022. Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2149–2159, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis (Ling et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.152.pdf

PDF Cite Search Fix data