Racist or Sexist Meme? Classifying Memes beyond Hateful

Memes are the combinations of text and images that are often humorous in nature. But, that may not always be the case, and certain combinations of texts and images may depict hate, referred to as hateful memes. This work presents a multimodal pipeline that takes both visual and textual features from memes into account to (1) identify the protected category (e.g. race, sex etc.) that has been attacked; and (2) detect the type of attack (e.g. contempt, slurs etc.). Our pipeline uses state-of-the-art pre-trained visual and textual representations, followed by a simple logistic regression classifier. We employ our pipeline on the Hateful Memes Challenge dataset with additional newly created fine-grained labels for protected category and type of attack. Our best model achieves an AUROC of 0.96 for identifying the protected category, and 0.97 for detecting the type of attack. We release our code at https://github.com/harisbinzia/HatefulMemes


Introduction
An internet meme (or simply "meme" for the remainder of this paper) is a virally transmitted image embellished with text. It usually shares pointed commentary on cultural symbols, social ideas, or current events (Gil, 2020). In the past few years there has been a surge in the popularity of memes on social media platforms. Instagram, which is a popular photo and video sharing social networking service recently revealed that over 1 million posts mentioning "meme" are shared on Instagram each day. 1 We warn the reader that this paper contains content that is racist, sexist and offensive in several ways. 1 https://about.instagram.com/blog/ announcements/instagram-year-in-reviewhow-memes-were-the-mood-of-2020 Although memes are often funny and used mostly for humorous purposes, recent research suggests that they can also be used to disseminate hate (Zannettou et al., 2018) and can therefore emerge as a multimodal expression of online hate speech. Hateful memes target certain groups or individuals based on their race (Williams et al., 2016) and gender (Drakett et al., 2018), among many other protected categories, thus causing harm at both an individual and societal level. An example hateful meme is shown in Figure 1. At the scale of the internet, it is impossible to manually inspect every meme. Hence, we posit that it is important to develop (semi-)automated systems that can detect hateful memes. However, detecting hate in multimodal forms (such as memes) is extremely challenging and requires a holistic understanding of the visual and textual material. In order to accelerate research in this area and develop systems capable of detecting hateful memes, Facebook recently launched The Hateful Memes Challenge (Kiela et al., 2020). The challenge introduced a new annotated dataset of around 10K memes tagged for hatefulness (i.e. hateful vs. nothateful). The baseline results show a substantial dif-ference in the performance of unimodal and multimodal systems, where the latter still perform poorly compared to human performance, illustrating the difficulty of the problem.
More recently, a shared task on hateful memes was organized at the Workshop on Online Abuse and Harms 2 (WOAH), where the hateful memes dataset (Kiela et al., 2020) was presented with additional newly created fine-grained labels 3 for the protected category that has been attacked (e.g. race, sex, etc.), as well as the type of attack (e.g. contempt, slurs, etc.). This paper presents our multimodal pipeline based on pre-trained visual and textual representations for the shared task on hateful memes at WOAH. We make our code publicly available to facilitate further research. 4

Problem Statement
There are two tasks with details as follows: •

Dataset
The dataset consists of 9,540 fine-grained annotated memes and is imbalanced, with large number of non-hateful memes and relatively small number of hateful ones. The details of different splits 5 are given in the Table 1   are given in Table 2. The majority of memes in the dataset are single-labeled. Figure 2 and Figure 3 present the distribution of memes with multiple protected categories and types of attacks respectively. For the evaluation, we use the standard AUROC metric.

Model & Results
This section describes our model, the visual & textual embeddings, as well as the results.

Embeddings
We use the following state-of-the-art pre-trained visual and textual representations: • CLIP 6 : OpenAI's CLIP (Contrastive Language Image Pre-Training) (Radford et al., 2021) is a neural network that jointly trains an image encoder and a text encoder to predict the correct pairings of a batch of (image, text) examples. We use pre-trained CLIP image encoder (hereinafter CIMG) and CLIP text  encoder (hereinafter CTXT) to embed meme images and text respectively.
• LASER 7 : Facebook's LASER (Language Agnostic SEntence Representations) (Artetxe and Schwenk, 2019) is a BiLSTM based seq2seq model that maps a sentence in any language to a point in a high-dimensional space with the goal that the same statement in any language will end up in the same neighborhood. We use LASER encoder to obtain embeddings for the meme text.

Pipeline
Exploiting the above models, we employ a simple four step pipeline as shown in Figure 4: 1. We extract text from the meme.
2. We embed the meme image and the text into visual and textual representations (Section 4.1).
3. We concatenate the visual and textual embeddings.
4. We train a multi-label Logistic Regression classifier using scikit-learn (Pedregosa et al., 2011) to predict the protected category attacked in the meme (Task A) and the type of attack (Task B).

Results
The results are shown in Table 3, where we contrast various configurations of our classifier. We observe that the vision-only classifier, which only uses visual embeddings (CIMG), performs slightly better than the text-only classifier, which only uses textual embeddings (CTXT, LASER or LaBSE). The multimodal models outperform their unimodal counterparts. Our best performing model is multimodal, trained on the concatenated textual (CTXT, LASER and LaBSE) and visual (CIMG) embeddings. 8 Class-wise performance of best model is given in Table 4.

Conclusion & Future Work
This paper has presented our pipeline for the multilabel hateful memes classification shared task organized at WOAH. We show that our multimodal classifiers outperform unimodal classifiers. Our best multimodal classifier achieves an AUROC of 0.96 for identifying the protected category, and 0.97 for detecting the attack type. Although we trained our classifier on language agnostic representations, it was only tested on a dataset of English memes. As a future direction, we plan to extend our work