MMFusion@CASE 2025: Attention-Based Multimodal Learning for Text-Image Content Analysis

Prerana Rane


Abstract
Text-embedded images, such as memes, are now increasingly common in social media discourse. These images combine visual and textual elements to convey complex attitudes and emotions. Deciphering the intent of these images is challenging due to their multimodal and context-dependent nature. This paper presents our approach to the Shared Task on Multimodal Hate, Humor, and Stance Detection in Marginalized Movement at CASE 2025. The shared task focuses on four key aspects of multimodal content analysis for text-embedded images: hate speech detection, target identification, stance classification, and humor recognition. We propose a multimodal learning framework that uses both textual and visual representations, along with cross-modal attention mechanisms, to classify content across all tasks effectively.
Anthology ID:
2025.case-1.14
Volume:
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa
Venues:
CASE | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
115–122
Language:
URL:
https://aclanthology.org/2025.case-1.14/
DOI:
Bibkey:
Cite (ACL):
Prerana Rane. 2025. MMFusion@CASE 2025: Attention-Based Multimodal Learning for Text-Image Content Analysis. In Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts, pages 115–122, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
MMFusion@CASE 2025: Attention-Based Multimodal Learning for Text-Image Content Analysis (Rane, CASE 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.case-1.14.pdf