ID4Fusion@CASE 2025: A Multimodal Approach to Hate Speech Detection in Text-Embedded Memes Using ensemble Transformer based approach

Tabassum Basher Rashfi, Md. Tanvir Ahammed Shawon, Md. Ayon Mia, Muhammad Ibrahim Khan


Abstract
Identification of hate speech in images with text is a complicated task in the scope of online content moderation, especially when such talk penetrates into the spheres of humor and critical societal topics. This paper deals with Subtask A of the Shared Task on Multimodal Hate, Humor, and Stance Detection in Marginalized Movement@CASE2025. This task is binary classification over whether or not hate speech exists in image contents, and it advances as Hate versus No Hate. To meet this goal, we present a new multimodal architecture that blends the textual and visual features to reach effective classification. In the textual aspect, we have fine-tuned two state-of-the-art transformer models, which are RoBERTa and HateBERT, to extract linguistic clues of hate speech. The image encoder contains both the EfficientNetB7 and a Vision Transformer (ViT) model, which were found to work well in retrieving image-related details. The predictions made by each modality are then merged through an ensemble mechanism, with the last estimate being a weighted average of the text- and image-based scores. The resulting model produces a desirable F1- score metric of 0.7868, which is ranked 10 among the total number of systems, thus becoming a clear indicator of the success of multimodal combination in addressing the complex issue of self-identifying the hate speech in text-embedded images.
Anthology ID:
2025.case-1.17
Volume:
Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Ali Hürriyetoğlu, Hristo Tanev, Surendrabikram Thapa
Venues:
CASE | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
139–145
Language:
URL:
https://aclanthology.org/2025.case-1.17/
DOI:
Bibkey:
Cite (ACL):
Tabassum Basher Rashfi, Md. Tanvir Ahammed Shawon, Md. Ayon Mia, and Muhammad Ibrahim Khan. 2025. ID4Fusion@CASE 2025: A Multimodal Approach to Hate Speech Detection in Text-Embedded Memes Using ensemble Transformer based approach. In Proceedings of the 8th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Texts, pages 139–145, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
ID4Fusion@CASE 2025: A Multimodal Approach to Hate Speech Detection in Text-Embedded Memes Using ensemble Transformer based approach (Rashfi et al., CASE 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.case-1.17.pdf