M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought

Gitanjali Kumari; Kirtan Jain; Asif Ekbal

doi:10.18653/v1/2024.emnlp-main.1234

M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought

Gitanjali Kumari, Kirtan Jain, Asif Ekbal

Abstract

In recent years, there has been a significant rise in the phenomenon of hate against women on social media platforms, particularly through the use of misogynous memes. These memes often target women with subtle and obscure cues, making their detection a challenging task for automated systems. Recently, Large Language Models (LLMs) have shown promising results in reasoning using Chain-of-Thought (CoT) prompting to generate the intermediate reasoning chains as the rationale to facilitate multimodal tasks, but often neglect cultural diversity and key aspects like emotion and contextual knowledge hidden in the visual modalities. To address this gap, we introduce a **M**ultimodal **M**ulti-hop CoT (M3Hop-CoT) framework for **M**isogynous meme identification, combining a CLIP-based classifier and a multimodal CoT module with entity-object-relationship integration. M3Hop-CoT employs a three-step multimodal prompting principle to induce emotions, target awareness, and contextual knowledge for meme analysis. Our empirical evaluation, including both qualitative and quantitative analysis, validates the efficacy of the M3Hop-CoT framework on the SemEval-2022 Task 5 (**MAMI task**) dataset, highlighting its strong performance in the macro-F1 score. Furthermore, we evaluate the model’s generalizability by evaluating it on various benchmark meme datasets, offering a thorough insight into the effectiveness of our approach across different datasets. Codes are available at this link: https://github.com/Gitanjali1801/LLM_CoT

Anthology ID:: 2024.emnlp-main.1234
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22105–22138
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1234/
DOI:: 10.18653/v1/2024.emnlp-main.1234
Bibkey:
Cite (ACL):: Gitanjali Kumari, Kirtan Jain, and Asif Ekbal. 2024. M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 22105–22138, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought (Kumari et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1234.pdf
Software:: 2024.emnlp-main.1234.software.zip

PDF Cite Search Software Fix data