Yumin Tian


2024

pdf bib
TMFN: A Target-oriented Multi-grained Fusion Network for End-to-end Aspect-based Multimodal Sentiment Analysis
Di Wang | Yuzheng He | Xiao Liang | Yumin Tian | Shaofeng Li | Lin Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

End-to-end multimodal aspect-based sentiment analysis (MABSA) combines multimodal aspect terms extraction (MATE) with multimodal aspect sentiment classification (MASC), aiming to simultaneously extract aspect words and classify the sentiment polarity of each aspect. However, existing MABSA methods have overlooked two issues: (i) They only focus on fusing image regional information and textual words for two subtasks of MABSA. Whereas, MATE subtask relies more on global image information to assist in obtaining the quantity and attributes of aspects. Ignoring the integration with global information may affect the performance of MABSA methods. (ii) They fail to take advantage of target information. Nevertheless, the fine-grained details of targets are important for classifying sentiments of aspects. To solve these problems, we propose a Target-oriented Multi-grained Fusion Network(TMFN). It fuses text information with global coarse-grained image information for MATE subtask and with fine-grained image information for MASC subtask. In addition, a target-oriented feature alignment (TOFA) module is designed to enhance target-related information in image features with target details. In such a way, image features will contain more target emotional-related information which is beneficial to sentiment classification. Extensive experiments show that our method outperforms state-of-the-art methods on two benchmark datasets.