Autonomous Aspect-Image Instruction a2II: Q-Former Guided Multimodal Sentiment Classification

Junjia Feng, Mingqian Lin, Lin Shang, Xiaoying Gao


Abstract
Multimodal aspect-oriented sentiment classification (MABSC) task has garnered significant attention, which aims to identify the sentiment polarities of aspects by combining both language and vision information. However, the limited multimodal data in this task has become a big gap for the vision-language multimodal fusion. While large-scale vision-language pretrained models have been adapted to multiple tasks, their use for MABSC task is still in a nascent stage. In this work, we present an attempt to use the instruction tuning paradigm to MABSC task and leverage the ability of large vision-language models to alleviate the limitation in the fusion of textual and image modalities. To tackle the problem of potential irrelevance between aspects and images, we propose a plug-and-play selector to autonomously choose the most appropriate instruction from the instruction pool, thereby reducing the impact of irrelevant image noise on the final sentiment classification results. We conduct extensive experiments in various scenarios and our model achieves state-of-the-art performance on benchmark datasets, as well as in few-shot settings.
Anthology ID:
2024.lrec-main.180
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
1996–2005
Language:
URL:
https://aclanthology.org/2024.lrec-main.180
DOI:
Bibkey:
Cite (ACL):
Junjia Feng, Mingqian Lin, Lin Shang, and Xiaoying Gao. 2024. Autonomous Aspect-Image Instruction a2II: Q-Former Guided Multimodal Sentiment Classification. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1996–2005, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Autonomous Aspect-Image Instruction a2II: Q-Former Guided Multimodal Sentiment Classification (Feng et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.180.pdf