BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification

Mithun Das; Animesh Mukherjee

doi:10.18653/v1/2023.emnlp-main.959

BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification

Abstract

The dramatic increase in the use of social media platforms for information sharing has also fueled a steep growth in online abuse. A simple yet effective way of abusing individuals or communities is by creating memes, which often integrate an image with a short piece of text layered on top of it. Such harmful elements are in rampant use and are a threat to online safety. Hence it is necessary to develop efficient models to detect and flag abusive memes. The problem becomes more challenging in a low-resource setting (e.g., Bengali memes, i.e., images with Bengali text embedded on it) because of the absence of benchmark datasets on which AI models could be trained. In this paper we bridge this gap by building a Bengali meme dataset. To setup an effective benchmark we implement several baseline models for classifying abusive memes using this dataset. We observe that multimodal models that use both textual and visual information outperform unimodal models. Our best-performing model achieves a macro F1 score of 70.51. Finally, we perform a qualitative error analysis of the misclassified memes of the best-performing text-based, image-based and multimodal models.

Anthology ID:: 2023.emnlp-main.959
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15498–15512
Language:
URL:: https://aclanthology.org/2023.emnlp-main.959/
DOI:: 10.18653/v1/2023.emnlp-main.959
Bibkey:
Cite (ACL):: Mithun Das and Animesh Mukherjee. 2023. BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15498–15512, Singapore. Association for Computational Linguistics.
Cite (Informal):: BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification (Das & Mukherjee, EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.959.pdf
Video:: https://aclanthology.org/2023.emnlp-main.959.mp4

PDF Cite Search Video Fix data