Gitanjali Kumari

2025

pdf bib abs
MemeDetoxNet: Balancing Toxicity Reduction and Context Preservation
Gitanjali Kumari | Jitendra Solanki | Asif Ekbal
Findings of the Association for Computational Linguistics: ACL 2025

Toxic memes often spread harmful and offensive content and pose a significant challenge in online environments. In this paper, we present MemeDetoxNet, a robust framework designed to mitigate toxicity in memes by leveraging fine-tuned pre-trained models. Our approach utilizes the interpretability of CLIP (Contrastive Language-Image Pre-Training) to identify toxic elements within the visual and textual components of memes. Our objective is to automatically assess the immorality of toxic memes and transform them into morally acceptable alternatives by employing large language models (LLMs) to replace offensive text and blurring toxic regions in the image. As a result, we proposed MemeDetoxNet that has three main primitives: (1) detection of toxic memes, (2) localizing and highlighting toxic visual and textual attributes, and (3) manipulating the toxic content to create a morally acceptable alternative. Empirical evaluation on several publicly available meme datasets shows a reduction in toxicity by approximately 10-20%. Both qualitative and quantitative analyses further demonstrate MemeDetoxNet’s superior performance in detoxifying memes compared to the other methods. These results underscore MemeDetoxNet’s potential as an effective tool for content moderation on online platforms.

2024

pdf bib abs
Unintended Bias Detection and Mitigation in Misogynous Memes
Gitanjali Kumari | Anubhav Sinha | Asif Ekbal
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Online sexism has become a concerning issue in recent years, especially conveyed through memes. Although this alarming phenomenon has triggered many studies from computational linguistic and natural language processing points of view, less effort has been spent analyzing if those misogyny detection models are affected by an unintended bias. Such biases can lead models to incorrectly label non-misogynous memes misogynous due to specific identity terms, perpetuating harmful stereotypes and reinforcing negative attitudes. This paper presents the first and most comprehensive approach to measure and mitigate unintentional bias in the misogynous memes detection model, aiming to develop effective strategies to counter their harmful impact. Our proposed model, the Contextualized Scene Graph-based Multimodal Network (CTXSGMNet), is an integrated architecture that combines VisualBERT, a CLIP-LSTM-based memory network, and an unbiased scene graph module with supervised contrastive loss, achieves state-of-the-art performance in mitigating unintentional bias in misogynous memes.Empirical evaluation, including both qualitative and quantitative analysis, demonstrates the effectiveness of our CTXSGMNet framework on the SemEval-2022 Task 5 (MAMI task) dataset, showcasing its promising performance in terms of Equity of Odds and F1 score. Additionally, we assess the generalizability of the proposed model by evaluating their performance on a few benchmark meme datasets, providing a comprehensive understanding of our approach’s efficacy across diverse datasets.

pdf bib abs
M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought
Gitanjali Kumari | Kirtan Jain | Asif Ekbal
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In recent years, there has been a significant rise in the phenomenon of hate against women on social media platforms, particularly through the use of misogynous memes. These memes often target women with subtle and obscure cues, making their detection a challenging task for automated systems. Recently, Large Language Models (LLMs) have shown promising results in reasoning using Chain-of-Thought (CoT) prompting to generate the intermediate reasoning chains as the rationale to facilitate multimodal tasks, but often neglect cultural diversity and key aspects like emotion and contextual knowledge hidden in the visual modalities. To address this gap, we introduce a **M**ultimodal **M**ulti-hop CoT (M3Hop-CoT) framework for **M**isogynous meme identification, combining a CLIP-based classifier and a multimodal CoT module with entity-object-relationship integration. M3Hop-CoT employs a three-step multimodal prompting principle to induce emotions, target awareness, and contextual knowledge for meme analysis. Our empirical evaluation, including both qualitative and quantitative analysis, validates the efficacy of the M3Hop-CoT framework on the SemEval-2022 Task 5 (**MAMI task**) dataset, highlighting its strong performance in the macro-F1 score. Furthermore, we evaluate the model’s generalizability by evaluating it on various benchmark meme datasets, offering a thorough insight into the effectiveness of our approach across different datasets. Codes are available at this link: https://github.com/Gitanjali1801/LLM_CoT

pdf bib abs
CM_CLIP: Unveiling Code-Mixed Multimodal Learning with Cross-Lingual CLIP Adaptations
Gitanjali Kumari | Arindam Chatterjee | Ashutosh Bajpai | Asif Ekbal | Vinutha B. NarayanaMurthy
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

In this paper, we present CMCLIP, a Code-Mixed Contrastive Linked Image Pre-trained model, an innovative extension of the widely recognized CLIP model. Our work adapts the CLIP framework to the code-mixed environment through a novel cross-lingual teacher training methodology. Building on the strengths of CLIP, we introduce the first code-mixed pre-trained text-and-vision model, CMCLIP, specifically designed for Hindi-English code-mixed multimodal language settings. The model is developed in two variants: CMCLIP-RB, based on ResNet, and CMCLIP-VX, based on ViT, both of which adapt the original CLIP model to suit code-mixed data. We also introduce a large, novel dataset called Parallel Hybrid Multimodal Code-mixed Hinglish (PHMCH), which forms the foundation for teacher training. The CMCLIP models are evaluated on various downstream tasks, including code-mixed Image-Text Retrieval (ITR) and classification tasks, such as humor and sarcasm detection, using a code-mixed meme dataset. Our experimental results demonstrate that CMCLIP outperforms existing models, such as M3P and multilingual-CLIP, establishing state-of-the-art performance for code-mixed multimodal tasks. We would also like to assert that although our data and frameworks are on Hindi-English code-mix, they can be extended to any other code-mixed language settings.

pdf bib abs
CM-Off-Meme: Code-Mixed Hindi-English Offensive Meme Detection with Multi-Task Learning by Leveraging Contextual Knowledge
Gitanjali Kumari | Dibyanayan Bandyopadhyay | Asif Ekbal | Vinutha B. NarayanaMurthy
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Detecting offensive content in internet memes is challenging as it needs additional contextual knowledge. While previous works have only focused on detecting offensive memes, classifying them further into implicit and explicit categories depending on their severity is still a challenging and underexplored area. In this work, we present an end-to-end multitask model for addressing this challenge by empirically investigating two correlated tasks simultaneously: (i) offensive meme detection and (ii) explicit-implicit offensive meme detection by leveraging the two self-supervised pre-trained models. The first pre-trained model, referred to as the “knowledge encoder,” incorporates contextual knowledge of the meme. On the other hand, the second model, referred to as the “fine-grained information encoder”, is trained to understand the obscure psycho-linguistic information of the meme. Our proposed model utilizes contrastive learning to integrate these two pre-trained models, resulting in a more comprehensive understanding of the meme and its potential for offensiveness. To support our approach, we create a large-scale dataset, CM-Off-Meme, as there is no publicly available such dataset for the code-mixed Hindi-English (Hinglish) domain. Empirical evaluation, including both qualitative and quantitative analysis, on the CM-Off-Meme dataset demonstrates the effectiveness of the proposed model in terms of cross-domain generalization.

2023

pdf bib
The Persuasive Memescape: Understanding Effectiveness and Societal Implications of Internet Memes
Gitanjali Kumari | Pranali Shinde | Asif Ekbal
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2021

pdf bib abs
Co-attention based Multimodal Factorized Bilinear Pooling for Internet Memes Analysis
Gitanjali Kumari | Amitava Das | Asif Ekbal
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Social media platforms like Facebook, Twitter, and Instagram have a significant impact on several aspects of society. Memes are a new type of social media communication found on social platforms. Even though memes are primarily used to distribute humorous content, certain memes propagate hate speech through dark humor. It is critical to properly analyze and filter out these toxic memes from social media. But the presence of sarcasm and humor in an implicit way analyzes memes more challenging. This paper proposes an end-to-end neural network architecture that learns the complex association between text and image of a meme. For this purpose, we use a recent SemEval-2020 Task-8 multimodal dataset. We proposed an end-to-end CNN-based deep neural network architecture with two sub-modules viz. (i)Co-attention based sub-module and (ii) Multimodal Factorized Bilinear Pooling(MFB) sub-module to represent the textual and visual features of a meme in a more fine-grained way. We demonstrated the effectiveness of our proposed work through extensive experiments. The experimental results show that our proposed model achieves a 36.81% macro F1-score, outperforming all the baseline models.

Co-authors

Venues