Riddhiman Swanan Debnath
2025
ExMute: A Context-Enriched Multimodal Dataset for Hateful Memes
Riddhiman Swanan Debnath
|
Nahian Beente Firuj
|
Abdul Wadud Shakib
|
Sadia Sultana
|
Md Saiful Islam
Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages
In this paper, we introduce ExMute, an extended dataset for classifying hateful memes that incorporates critical contextual information, addressing a significant gap in existing resources. Building on a previous dataset of 4,158 memes without contextual annotations, ExMute expands the collection by adding 2,041 new memes and providing comprehensive annotations for all 6,199 memes. Each meme is labeled across six defined contexts with language markers indicating code-mixing, code-switching, and Bengali captions, enhancing its value for linguistic and cultural research. These memes are systematically labeled across six contexts: religion, politics, celebrity, male, female, and others, facilitating a more nuanced understanding of meme content and intent. To evaluate ExMute, we apply state-of-the-art textual, visual, and multimodal approaches, leveraging models including BanglaBERT, Visual Geometry Group (VGG), Inception, ResNet, and Vision Transformer (ViT). Our experiments show that our custom LSTM-based attention-based textual model achieves an accuracy of 0.60, while VGG-based visual models reach up to 0.63. Multimodal models, which combine visual and textual features, consistently achieve accuracy scores of around 0.64, demonstrating the dataset’s robustness for advancing multimodal classification tasks. ExMute establishes a valuable benchmark for future NLP research, particularly in low-resource language settings, highlighting the importance of context-aware labeling in improving classification accuracy and reducing bias.