Jibes & Delights: A Dataset of Targeted Insults and Compliments to Tackle Online Abuse

Online abuse and offensive language on social media have become widespread problems in today’s digital age. In this paper, we contribute a Reddit-based dataset, consisting of 68,159 insults and 51,102 compliments targeted at individuals instead of targeting a particular community or race. Secondly, we benchmark multiple existing state-of-the-art models for both classification and unsupervised style transfer on the dataset. Finally, we analyse the experimental results and conclude that the transfer task is challenging, requiring the models to understand the high degree of creativity exhibited in the data.


Introduction
Online abuse and targeted negative comments on online social networks have become a prevalent phenomenon, especially impacting adolescents. Victims of prolonged and targeted online harassment can experience negative emotions leading to adverse consequences such as anxiety and isolation from the community, which can, in turn, lead to suicidal behaviour. 1 Various attempts have been made to detect such harassment (Dadvar et al., 2013;Chatzakou et al., 2019) and hate speech (Davidson et al., 2017;Badjatiya et al., 2017) in the past, but very few have attempted to transfer the negative aspect of such speech.
Recently, many new tasks have been introduced in the domain of text style transfer. However, since parallel corpora is usually not available, most style transfer approaches adopt an unsupervised manner Zhang et al., 2018;John et al., 2019; . We contribute a dataset of non-parallel sentences, each sentence being either an insult or a compliment, collected from Reddit, more specifically, from three subreddits r/RoastMe, 1 Source: https://www.stopbullying.gov/resources/facts Figure 1: Examples of indirect insults and compliments with attributes highlighted in bold r/ToastMe, and r/FreeCompliments. Some examples of such sentences can be seen in Figure 1.
With a diverse range of online communication platforms being introduced across the world, and existing platforms' user-bases growing at a fast pace, moderation of such negative comments and harassment becomes even more necessary. We hope that our work can enable and further research in this daunting task, for instance in building moderation systems which can detect such negative speech and nudge users to engage in more positive and non-toxic discourse.
Reddit is a popular social media website with forums known as subreddits where users can comment and vote on posts and other comments. It has been used as a source of data in wide variety of tasks. r/RoastMe can be described as a sub-reddit consisting of "abrasive humor" and consists of "creative" insults where users can voluntarily submit their picture to be "roasted." r/ToastMe and r/FreeCompliments are similar in principle but have the opposite purpose. A more detailed description of the data source and preparation can be found in Section 3. Since "creativity" is encouraged in r/RoastMe, this makes our dataset consist of indirect insults that do not necessarily use any profanity or curse words and may slip past most existing toxic speech filters.
In this work, we release the JDC (Jibe and Delight Corpus), a dataset of ∼ 120, 000 Reddit comments tagged as insults or compliments which are targeted towards particular attributes of an individual including their face, hair, and eyes 2 . We also propose to use classification models to detect and style transfer models to convert such targeted negative comments, often associated with online harassment, in which menacing or insulting messages are sent by direct messages or posted on social media. We also perform benchmarking experiments using existing state-of-the-art models on both fronts and analyse its results.

Related Work
Existing work primarily focuses on the detection of offensive language or hate speech on social media using classification models (Davidson et al., 2017;Badjatiya et al., 2017;Dadu and Pant, 2020), and not on transferring the negative aspect of such speech into a positive counterpart. Detection usually involves either lexical or rule-based approach (Pérez et al., 2012;Serra and Venter, 2011), or more recently, a supervised learning approach (Yin et al., 2009;Dinakar et al., 2011). Many attempts on detection of specific types of toxic speech have also been attempted (Basile et al., 2019;Zampieri et al., 2020). Previous work on text style transfer has largely focused on transferring attributes of sentiment in reviews Hu et al., 2017; or converting factual captions to humorous or romantic ones . Other tasks include transferring formality (Xu et al., 2019) or gender or political style (Reddy and Knight, 2016). Recently, transferring politeness has also been proposed by Madaan et al. (2020).
Most approaches use unsupervised methods since parallel data is usually not available. These approaches can be broadly divided into three groups: 1) Explicit disentanglement Sudhakar et al., 2019) which separates content from style attributes in an explicit manner and then combines the separated content with the target attribute and pass it through a generator. 2) Disentanglement in latent space (John et al., 2019) which tries to separate style from content within the embedding space by using suitable objective functions. 3) Adversarial or reinforcement learning based (Luo et al., 2019) approaches in which disentanglement may not be even required.
Reddit has been widely used in multiple natural language processing tasks as a data source. While Khodak et al. (2018) use Reddit to create a large corpus for sarcasm, Nogueira dos Santos et al.
(2018) source their data from r/Politics on Reddit along with Twitter. Many controversial subreddits such r/The Donald have been used for detection of hate speech in the past (Qian et al., 2019).
Although Nogueira dos Santos et al. (2018) proposed the task of translating offensive sentences to non-offensive ones using style transfer, in our work, we go one step further and propose to convert offensive sentences into positive compliments. Prior work on r/RoastMe has mostly been on a sociopragmatic perspective (Dynel and Poppi, 2019;Kasunic and Kaufman, 2018). However, there is no previous work that uses r/RoastMe as a data source in a style transfer task to the best of our knowledge.

The JDC Dataset
We contribute the Jibe and Delight Corpus (JDC), a new non-parallel style transfer dataset consisting of ∼ 120, 000 comments tagged as insults or compliments, and perform experiments and analysis on the same.

Data Collection
We use Pushshift (Baumgartner et al., 2020) to extract Reddit posts and comments.While r/RoastMe is often characterized as a humourous subreddit, where users can voluntarily submit pictures of themselves to be "roasted" or insulted, internet users who are not familiar with the community can associate r/RoastMe with malicious activities including cyberbullying (Jenaro et al., 2018). r/RoastMe has even been described as "a new cy- berbullying trend" by news media 3 . The "roasters" will come up with comments that insult or demean the poster of the picture while trying to be as "creative" as possible. r/ToastMe and r/FreeCompliments work similarly, but with the opposite intent. These communities are much smaller and less popular than r/RoastMe, and hence, the number of insults in our dataset is greater than the number of compliments.
It is essential to distinguish between insults and slurs since the two are frequently clubbed together. A slur is a taboo remark that is usually used to deprecate, disparage or derogate a targeted member of a select category, such as ethnicity, race, or sexual orientation. Insults can consist of slurs, but they are much broader, and a more diversified phenomenon (Dynel and Poppi, 2019). While slurs are common in hate speech datasets, it is interesting to observe that slurs are comparatively rare in posts from r/RoastMe. This characteristic makes our dataset unique concerning other offensive speech datasets. Figure 2 illustrates our pipeline for the creation of the JDC.

Data Preparation
Since Reddit is a conversational social media platform, we limit the JDC to comments having the following characteristics: • They are top-level comments. While nested comments may also sometimes contain relevant data, they often diverge from the topic This filtering yields a corpus of roughly 300, 000 insult comments and 100, 000 compliment comments. However, due to users' wide variety of insults and creativity, we utilize several other filters to limit our dataset to a particular type of insult or compliment. We manually create an attribute list 4 consisting of words which correspond to a physical attribute (for example, hair, skin, complexion, teeth, and eyes) or a trait (for example, personality, kindness, and appearance) and keep our insults containing keywords for such attributes. We use the NLTK (Bird et al., 2009) and spaCy (Honnibal and Montani, 2017) libraries for preprocessing and filtering. We tokenize each comment into sentences, and check if the lemma of any word in a sentence matches a word in our attribute list. This process helps us keep relevant sentences, especially from longer comments, which would be otherwise discarded. We also filter out very short sentences (containing only one or two words). We obtain Input Your teeth are more stained than my toilet StyleEmb Your hair is beautiful than your face. RetrieveOnly Beautiful smile -your teeth are remarkably straight DeleteRetrieve Your hair is beautiful and your teeth are more stained than my LingST Your teeth are so beautiful Tag&Gen Your teeth are more stained than my heart Input The only thing lazier than your eye is God when he designed your busted face StyleEmb Keep your hair and I love your hair and you look like the kind of person who look like a RetrieveOnly Your hair is fantastic and your face is absolutely adorable. DeleteRetrieve And your eye expressive is God wonderful lips designed especially your face LingST The only thing more crooked than your face is the absolute cutest thing i Tag&Gen Love the only thing lazier than your eye is god when he designed your busted face

Experiments & Results
We perform experiments using both classification and style transfer models and evaluate the performance of five models for each task from works on the JDC. We also discuss about the challenges faced and the metrics used for evaluation.

Models
For classification experiments, we experiment using the following models: 1. Logistic Regression: One of the most common classification algorithms used, Logistic Regression (LR) uses the logistic (sigmoid) function to return a probability value that can be further mapped to multiple classes. We fine-tune the models for classification in case of BERT, RoBERTa, and XLNet, and use the Hugging Face's transformers library (Wolf et al., 2020) for our experiments.
For style transfer experiments, we use the following models: 1. StyleEmb (Shen et al., 2017): This model uses a cross-aligned auto-encoder, aligning representations in latent space to perform style transfer.
2. RetrieveOnly and DeleteRetrieve : While RetrieveOnly only returns a sentence from the target style without any changes, DeleteRetrieve returns the best match according to the attribute markers from the source sentences. Both models use explicit disentanglement to separate content from style along with a decoder and are often used as baselines in multiple works in style transfer.   This has recently been utilized for the politeness transfer task.

Evaluation
For the classification experiments, we evaluate using the well-known metrics of Accuracy, Precision, Recall and F1-Score. For the style transfer experiments, we evaluate the performance on three different aspects, following previous works: 1. Style Transfer Intensity: We train a separate fastText model (Joulin et al., 2017) on the training data and evaluate the different model outputs to determine the accuracy of style transfer.

Content Preservation:
We use BLEU as an evaluation metric and utilize the SacreBLEU (Post, 2018) implementation.

Fluency:
We calculate fluency using a language model from the KenLM library for our experiments. (Heafield, 2011) after training the language model on the target domain (compliment). A lower perplexity indicates a more fluent sentence and vice-versa.
Apart from automatic evaluation, we also do human evaluation on 280 sentences randomly selected from the test set. The evaluators were asked to rank sentences on basis of their fluency and degree of being a compliment (DOC) on a scale of 1 to 5. Two annotators were shown a list of sentences, with no indication of the source of the sentence. The Cohen's Kappa metric (Cohen, 1960) was used to measure the agreement between the two annotators. The value of kappa for DOC and fluency come out to be 0.69 and 0.65 respectively. Table 2 shows that most of the models perform very well in classifying insults and compliments   into different categories. Even ML-based models like LR and SVM perform adequately on the task, but the more state-of-the-art BERT-based models perform excellently, having high F1-scores above 0.9. XLNet shows the highest F1-score, with BERT and RoBERTa only marginally lower. From Table 3, we observe that RetrieveOnly, StyleEmb, and LingST show high accuracy in transfer but do not perform well in content preservation. Tag&Gen performs very well on content preservation but fails to transfer the style adequately. DeleteRetrieve obtains a better balance in accuracy and BLEU, but it loses out on fluency, producing the least fluent sentences among all models. This implies that although the relevant words from style and content are transferred, the output may not be grammatical or natural.

Results
Human evaluation results in Table 4 also show that the input sentence is judged as more fluent rather than the model outputs. We see that Re-trieveOnly and LingST outputs are more likely to be judged as compliments. However, StyleEmb is judged to be as poor in both DOC and Fluency.

Discussion
Even though the insults are "creative", we find that the classification models perform excellently in differentiating insults and compliments into two separate categories. This shows that both insults However, performance of the style transfer models is lacking. We observe that there are different shortcomings in each model. Table 1 shows sample outputs produced by the models. RetrieveOnly fetches a sentence from the target style using specific attributes from the input sentence, often leading to invalid outputs if a corresponding positive sentence does not exist in the data. In the second example in Table 1, DeleteRetrieve gives a nonsensical output. Other models have similarly tried to transfer the intent by introducing words like "kind","wonderful" and "cute", but there is still a significant gap between the generated outputs and genuine compliments. We observe that LingST has the lowest perplexity while still having high accuracy. This ensures that generated outputs are positive and are also grammatically fluent. Compared to sentiment transfer, converting an insult to a compliment usually involves multi-word modifications, explaining the poor content preservation across most of the models.
We observe that style transfer models have an easier time handling more direct insults ("You look very ugly"), rather than handling more complex and creative insults ("How many concrete walls did you have to run into to achieve that nose?"). Besides the more direct and creative the insults, there are some samples which need more context to understand and may seem out of place compared to the rest. However, most of these are filtered out with the help of the attribute list described in Section 3. The distribution of the data according to the top attributes can be seen in Figure 3 and Figure  4 for insults and compliments respectively. While a lot of creativity is exhibited in the insults, we find that compliments are usually of a direct form, and thus are simpler and easier to understand. This is desirable since some convoluted compliments may come across as patronizing, which is counterproductive to our goal.

Conclusion
In this work, we introduced a Reddit-based dataset consisting of indirect insults that favor creativity and rarely use slurs. We benchmarked classification models for detection and exploited unsupervised text style transfer to convert insults into compliments. We evaluated the performance of different state-of-the-art models on the dataset, observing that while detection is easier, transfer of the negative attribute is a challenging task. Future work may include enhancing methodologies for unsupervised text style transfer that capture the intricacies in the proposed dataset and building moderation systems for online platforms.