Kuntao Li
2025
Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection
Kuntao Li
|
Yifan Chen
|
Qiaofeng Wu
|
Weixing Mai
|
Fenghuan Li
|
Yun Xue
Proceedings of the 31st International Conference on Computational Linguistics
Multi-modal sarcasm detection aims to identify whether a given image-text pair is sarcastic. The pivotal factor of the task lies in accurately capturing incongruities from different modalities. Although existing studies have achieved impressive success, they primarily committed to fusing the textual and visual information to establish cross-modal correlations, overlooking the significance of original unimodal incongruity information at the text-level and image-level. Furthermore, the utilized fusion strategies of cross-modal information neglected the effect of inherent ambiguity within text and image modalities on multimodal fusion. To overcome these limitations, we propose a novel Ambiguity-aware Multi-level Incongruity Fusion Network (AMIF) for multi-modal sarcasm detection. Our method involves a multi-level incongruity learning module to capture the incongruity information simultaneously at the text-level, image-level and cross-modal-level. Additionally, an ambiguity-based fusion module is developed to dynamically learn reasonable weights and interpretably aggregate incongruity features from different levels. Comprehensive experiments conducted on a publicly available dataset demonstrate the superiority of our proposed model over state-of-the-art methods.
2024
D2R: Dual-Branch Dynamic Routing Network for Multimodal Sentiment Detection
Yifan Chen
|
Kuntao Li
|
Weixing Mai
|
Qiaofeng Wu
|
Yun Xue
|
Fenghuan Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing