A Computational Exploration of Pejorative Language in Social Media

In this paper we study pejorative language, an under-explored topic in computational linguistics. Unlike existing models of offensive language and hate speech, pejorative language manifests itself primarily at the lexical level, and describes a word that is used with a negative connotation, making it different from offensive language or other more studied categories. Pejorativity is also context-dependent: the same word can be used with or with-out pejorative connotations, thus pejorativity detection is essentially a problem similar to word sense disambiguation. We leverage online dictionaries to build a multilingual lexicon of pejorative terms for English, Spanish, Italian, and Romanian. We additionally release a dataset of tweets annotated for pejorative use. Based on these resources, we present an analysis of the usage and occurrence of pejorative words in social media, and present an attempt to automatically disambiguate pejorative usage in our dataset.


Introduction
With the increase of social media usage, the issue of toxic language has become an important problem in our society. Automatic methods are needed to help mitigate this problem, and for this reason the study of toxic speech in NLP has become very popularity in recent years. Different categories and definitions have been proposed, including hate speech (Schmidt and Wiegand, 2017;Vashistha and Zubiaga, 2021), offensive language (Zampieri et al., 2019;Bucur et al., 2021), aggression (Kumar et al., 2018(Kumar et al., , 2020, as well as further sub-categories depending on the targets, such as women, migrants, etc. (Basile et al., 2019). From a computational perspective, the problem is usually approached as a classification task at the post level, where a classifier is trained to predict whether a social media post contains offensive/toxic language.
In this paper we address the question of pejorative words. Pejorative words are words or phrases that have negative connotations or that are intended to disparage or belittle 1 . Pejorativity is closely related to the notion of slurs or insults: "as noun phrases, 'insult' and 'slur' refer to symbolic vehicles designed by convention to derogate targeted individuals or groups" (Anderson and Lepore, 2013). While pejorative language is often used in offensive speech (Castroviejo et al., 2020), they are not identical categories. There are offensive posts that do not use pejorative words (e.g. "Women belong in the kitchen"), and pejorative uses of words that are not harmful ("What a shitty chair") because the offensive content is not targeted at a person or a group as described in the popular annotation taxonomy of the Offensive Language Identification Dataset (OLID) (Zampieri et al., 2019).
Words can have a negative meaning in one context and not in others (such as the figurative meanings of "trash" or "pussy"); or be pejorative in one language or culture, and not in others (such as the Romanian "cioara" (literally, "crow") -a slur for people of color). Slurs can also lose their pejorative meaning through semantic change (e.g. the word "queer" went through semantic amelioration over the years -it used to be a slur and is losing its negative connotation (Brontsema, 2004)). Recognizing the complexity of the phenomenon, with its linguistic subtleties as well as the variability related to culture and context, are important to successfully recognize pejorative words and by extension offensive posts and hate speech.
Pejorative language is still largely underexplored in computational linguistics. There are very few studies addressing or taking pejorative language into account (Wiegand et al., 2018;Mendelsohn et al., 2020;Palmer et al., 2017;Eder et al., 2019;Castroviejo et al., 2020). A few related works to ours include Palmer et al. (2017) who focused on pejorative connotations for nominalized adjectives and Mendelsohn et al. (2020) who built a lexicon of vulgar terms (and vulgarity scores) for German based on derogatory terms found in Wiktionary.
In this study, we address this important gap by leveraging dictionaries to build a multilingual lexicon of pejorative language for four languages. We compare the occurrence of pejorativity in social media with other established categories of toxic language, relying on existing hate speech corpora. Unlike most existing studies in hate speech and offensive language identification, our paper focuses on the lexical level and approaches the issue of ambiguity in toxic language, formulating the problem of pejorativity detection as a word sense disambiguation (WSD) task. The main contributions of this work are the following: 1. We create a multilingual lexicon of pejorative words in four languages: English, Spanish, Italian, and Romanian.
2. We present several experiments to automatically distinguish pejorative from nonpejorative uses of words relying on state-ofthe-art word sense representations based on contextual embeddings.
3. We release annotated datasets containing pejorative words in English and Spanish tweets.

Data Collection
We started by gathering a pejorative lexicon for four languages: English, Spanish, Italian and Romanian. For each language, we assembled a list of words that can be used with a pejorative sense according to existing language resources. We focused on providing a lexicon consisting of words that can be used pejoratively on their own, rather than words that are part of pejorative expressions or idioms. In order to collect these terms for English, Spanish, and Italian we used Wiktionary 2 , and collected the terms that were part of the "derogatory terms" category. For Romanian, we used another onlineavailable dictionary, dexonline 3 , and selected all of the words that had a pejorative definition and where the definition was intended for the word not for an expression built around the word.

Lexicon Description
For each language's lexicon, we computed the frequency of each word, based on occurrence across different large corpora including Wikipedia and social media datasets, using the wordfreq Python library (Speer et al., 2018). We used the Word-Net (Miller, 1995) to count the number of senses a word can have (by counting the number of synsets that they are contained in) as well as their parts of speech. Statistics are shown in Table 1. The distribution across parts of speech is illustrated in Figure  1. For a given word, we counted all its possible parts of speech according to WordNet.

Pejorative Tweet Dataset
For building a data set of English texts containing words that are used pejoratively, we started by looking at three datasets of hate speech on Twitter: (Davidson et al., 2017), (Basile et al., 2019). (Waseem and Hovy, 2016), and selected the tweets that contain words from our pejorative lexicon (after normalizing words to their stems). For each data set, we extracted pairs of words and tweets where they occur. The dataset published by Davidson et al. (2017) contains tweets annotated with one of three classes (hateful, offensive and neither). For each label, the number of pejorative words found in the tweets is the following: 1, 114 out of 1, 430 hateful tweets, 8, 358 out of 19, 190 offensive tweets, and 2, 221 among the remaining 4, 163 tweets were found to contain pejorative words. The hate speech dataset published as part of the HatEval shared task (Basile et al., 2019) contains tweets annotated with labels for hateful and aggressive speech. Out of the 4, 210 hateful tweets, 1, 985 contain words from our lexicon, while from 1, 763 aggressive tweets, 822 were selected. Finally, the dataset by Waseem and Hovy (2016) contains tweets annotated for racist and sexist speech. 8 tweets out of the 1, 970 racist tweets, and 897 from 3, 378 sexist tweets, contain pejorative words.
For Spanish, we employed the same technique of filtering tweets. We looked at the Spanish tweets data set provided by Basile et al. (2019) and considered only the binary label for hate speech classification. Out of the total of 5, 000 tweets, we have extracted 1, 621 hateful examples and 1, 667 non-hateful examples that contain words from our Spanish pejorative lexicon.

Annotation
We then built a data set of English tweets annotated for pejorative usage of words, by selecting tweets from the HatEval data set (Davidson et al., 2017), which we chose given the large number of unique pejorative words it contains (1, 77 for hate, 3, 95 for offensiveness and 2, 77 for none). We extracted two separate data sets in two different ways.
The first data set (PEJOR1) was built by selecting a fixed percentage of tweets from each class, in order to obtain a balanced dataset with respect to the three labels (keeping only words that are represented at least once in each class). In this way, we attempt to conserve the relative distribution of the pejorative stems across the three classes.
The second data set (PEJOR2) was built to be balanced with regard to both the words' distribution and the original labels. For each pejorative stem we extracted a fixed number of pairs from each of the three classes.
The selected tweet-word pairs extracted for both of the data sets were then annotated with binary valued labels, denoting whether the word in the pair is used pejoratively (label 1) or not (label 0) in the tweet. We used the Wiktionary definitions in order to label words as pejorative only when used with senses marked as "derogatory" in Wiktionary.
The Table 2 shows statistics for the two datasets, while Figure 2 illustrates the distribution of labels for words in PEJOR2. Data was annotated by specialists in linguistics. We used two annotators for each datapoint, and used a third one where there was disagreement. The obtained Cohen's k agreement score was 0.933.   For Spanish, we built a pejorative data set by selecting tweets from the (Basile et al., 2019) data set, following the same approach used for extracting the PEJOR2 English examples. We annotated a small subset of the tweets, consisting of 12 pejorative words with 10 tweets each (balanced between hateful and non-hateful tweets).

Classification Experiments
The classification task we approached was inferring the 0/1 label for tweet-word pairs. Namely, given a word and a tweet, where the word appears in the tweet, we want to be able to say if the word was used pejoratively or not in that tweet. In order to prepare our data, for each tweet-word pair, the tweet was tokenized and the position of the occurrence of the word was found among the tokens. Then, we generated a contextual embedding (Devlin et al., 2019) for that occurrence, by employing various BERT models, pre-trained on English texts, provided by the huggingface Python library (Wolf et al., 2019). The embedding obtained for the specified position is computed by summing the 768-dimensional hidden states generated for that position by each of the 12 layers of the BERT architecture. We note that, for out-of-vocabulary words, the BERT tokenizer provided by the huggingface library splits them into sub-words. In this case we chose to generate the embeddings for each of the sub-words of our word occurrence and then average them to obtain the final 768-dimensional embedding. Figure 3 illustrates an example of uses of a pejorative word ("cracker") in the PEJOR2 dataset, by representing its embeddings reduced to two dimensions using PCA. We can see that most of the similar labelled examples are clustered together. Figure 3: 2D plot of the contextual embeddings generated for the word 'cracker' in the PEJOR2 data set, for each of its occurrences in the tweets, using a pretrained BERT model. Embeddings were reduced to two dimensions using PCA.
For classification on our English data set, we grouped the pairs by the pejorative word contained in the tweet, and independently for each group, we fitted a classifier on the contextual embeddings (Liu et al., 2020). For extracting the embeddings we used various transformer models (BERT base (Devlin et al., 2019), BERTweet (Nguyen et al., 2020), RoBERTa (Liu et al., 2019), Multilingual BERT (Devlin et al., 2019)) and for the classification algorithm we used K-Nearest Neighbors, Support Vector Machines (SVM), Multilayer Perceptron (MLP). For K-Nearest Neighbors, we considered the cosine similarity as the distance function and found through hyper-parameter tuning that neighborhoods of size 4 were the best performing setting.
For evaluation, we employed a 5-fold crossvalidation. Performance metrics were computed for each word independently, measuring the capacity of distinguishing the pejorative and non-pejorative usage of the word in different contexts. We report, for each metric, the value resulted by averaging over the scores obtained for all of the word groups. We leave out from this averaging the words that appear with only one label in the whole data set (only pejorative or only non-pejorative), since they will be always classified correctly regardless of the contextual embeddings. We also employed a baseline that based on the training data it learns to predict only the most frequent label. Table 3 shows the obtained results. The appendix contains a table with nearest neighbors found for example tweets.
We notice a promising performance of the classifiers in distinguishing pejorative usage, of up to 0.86 F1-score. Following the best performing models for each data set, overall 107 samples were misclassified in the PEJOR1 dataset, while for PEJOR2 there were 37. Words in PEJOR2 seem slightly easier to classify, which might be expected given the dataset is more balanced in positive and negative examples.

Conclusions
We have addressed an important but under-explored lexical category in the intersection of lexical semantics and toxic speech: pejorativity. We released a public lexicon of pejorative words in four languages (including a low-resource language), as well as dataset of tweets annotated for pejorative uses of words. 4 We have modelled pejorativity detection as a problem of disambiguation, and performed experiments using state-of-the-art contextual embeddings in order to automatically distinguish pejorative from non-pejorative uses of words, obtaining promising results. In the future, we would like to explore modelling the problem of pejorativity detection as a sequence labelling task. At the application level, integrating pejorativity detection into hate speech detection systems, for example, would be a promising area for future research. From a linguistic perspective, it would be interesting to analyze occurrence and pejorative value cross-lingually taking advantage of large pretrained cross-lingual models as in Zampieri (2020, 2021) for offensive language identification. We expect pejorative connotations to be difficult to translate and not transfer well across languages, which could also have practical implications. We would also like to extend our dataset of social media posts to cover more pejorative terms, as well as other languages.