Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection

Kuntao Li, Yifan Chen, Qiaofeng Wu, Weixing Mai, Fenghuan Li, Yun Xue


Abstract
Multi-modal sarcasm detection aims to identify whether a given image-text pair is sarcastic. The pivotal factor of the task lies in accurately capturing incongruities from different modalities. Although existing studies have achieved impressive success, they primarily committed to fusing the textual and visual information to establish cross-modal correlations, overlooking the significance of original unimodal incongruity information at the text-level and image-level. Furthermore, the utilized fusion strategies of cross-modal information neglected the effect of inherent ambiguity within text and image modalities on multimodal fusion. To overcome these limitations, we propose a novel Ambiguity-aware Multi-level Incongruity Fusion Network (AMIF) for multi-modal sarcasm detection. Our method involves a multi-level incongruity learning module to capture the incongruity information simultaneously at the text-level, image-level and cross-modal-level. Additionally, an ambiguity-based fusion module is developed to dynamically learn reasonable weights and interpretably aggregate incongruity features from different levels. Comprehensive experiments conducted on a publicly available dataset demonstrate the superiority of our proposed model over state-of-the-art methods.
Anthology ID:
2025.coling-main.26
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
380–391
Language:
URL:
https://aclanthology.org/2025.coling-main.26/
DOI:
Bibkey:
Cite (ACL):
Kuntao Li, Yifan Chen, Qiaofeng Wu, Weixing Mai, Fenghuan Li, and Yun Xue. 2025. Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection. In Proceedings of the 31st International Conference on Computational Linguistics, pages 380–391, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection (Li et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.26.pdf