Exploiting Emojis for Abusive Language Detection

We propose to use abusive emojis, such as the “middle finger” or “face vomiting”, as a proxy for learning a lexicon of abusive words. Since it represents extralinguistic information, a single emoji can co-occur with different forms of explicitly abusive utterances. We show that our approach generates a lexicon that offers the same performance in cross-domain classification of abusive microposts as the most advanced lexicon induction method. Such an approach, in contrast, is dependent on manually annotated seed words and expensive lexical resources for bootstrapping (e.g. WordNet). We demonstrate that the same emojis can also be effectively used in languages other than English. Finally, we also show that emojis can be exploited for classifying mentions of ambiguous words, such as “fuck” and “bitch”, into generally abusive and just profane usages.


Introduction
Abusive or offensive language is defined as hurtful, derogatory or obscene utterances made by one person to another. 1 In the literature, closely related terms include hate speech (Waseem and Hovy, 2016) or cyber bullying (Zhong et al., 2016). While there may be nuanced differences in meaning, they are all compatible with the general definition above.
Due to the rise of user-generated web content, the amount of abusive language is also steadily growing. NLP methods are required to focus human review efforts towards the most relevant microposts. Building classifiers for abusive language detection requires expensive manually labeled data.
In this paper we explore distant supervision (Mintz et al., 2009) for abusive language detection in which abusive emojis serve as a heuristic to identify abusive language (1)-(8). These texts are subsequently used as training data. The advantage 1 http://thelawdictionary.org of emojis is that some of them are unambiguously abusive. They are also often redundant (Donato and Paggio, 2017), i.e. they convey something already expressed verbally in the micropost. Since the concept conveyed by an emoji can be expressived verbally in many different ways, abusive emojis may co-occur with many different abusive words (e.g. idiot, cunt). Moreover, the meaning of emojis is (mostly) shared across languages.
(1) You are such a hypocrite ... Have your dinner dick (2) @USER @USER you need a good old fashion man sized ass kicking you little Twitt (3) @USER I challenge you to go on a diet you fat cunt (4) @USER You are so so stupid you monkey face (5) Send your location, I'll send some killers (6) @USER @USER A vote for toddstone or any liberal. Id rather flush a toilet. (7) Fuck the 12 fuck the cops we aint forgot about you, kill em all kill em all (8) @USER She is such a disgusting despicable human being! Ugh! Recently, there has been significant criticism of in-domain supervised classification in abusive language detection, whose evaluation has been shown to produce overly optimistic classification scores. They are the result of biases in the underlying datasets.  show that on the most popular dataset for this task (Waseem and Hovy, 2016), classifiers learn co-incidental correlations between specific words (e.g. football or sport) and the abusive class label. Such spurious correlations help classifiers to correctly classify difficult microposts on that particular dataset. Arango et al. (2019) show that since on the dataset from Waseem and Hovy (2016) the majority of abusive tweets originate from just 2 authors, classifiers learn the authors' writing style rather than abusive language.
In order to avoid an evaluation affected by such topic or author biases, we focus on learning a lexicon of abusive language. A lexicon-based approach to detect abusive language primarily focuses on the detection of explicitly abusive language, i.e. abusive language that is conveyed by abusive words. Such an approach is currently the most effective clue known for cross-domain classification (Wiegand et al., 2018a). In general, other types of abusive language that are more implicit, such as sarcasm, jokes or stereotypes, require more contextual interpretation of words. Supervised classification is theoretically able to conduct such contextual interpretation. However, it has been reported to perform very poorly (Karan andŠnajder, 2018;Arango et al., 2019) on this task because the biases these classifiers exploit are unlikely to be present across different datasets . Therefore, we focus on explicitly abusive language in this work, since there are no ways of reliably detecting implicitly abusive language.
Despite the existence of lexicons for abusive words, induction methods are required, since new abusive words enter language constantly. Further, there are only few lexicons available in languages other than English. The aim of our work is not to detect completely new types of abusive language but to find an inexpensive and language-independent method for lexicon induction.
Our contributions in this paper are: • We use emojis to induce a lexicon of abusive words. Unlike previous work, such an approach does not depend on manually labeled training data or expensive resources, such as WordNet or intensity lexicons. We also demonstrate its effectiveness on crossdomain classification of microposts. • In order to show the general applicability of our approach, we apply it not only to English but also to Portuguese and German data. The output of this study are three state-of-the-art lexicons that we make publicly available along with all other resources created in this paper. • We use emojis to disambiguate the context of potentially abusive words. We exemplify this on the two ambiguous and frequent words fuck and bitch. A by-product is a dataset of mentions of these words annotated in context.
The supplementary material 2 to this paper includes all resources newly created for our research and notes on implementation details.
Lexicon induction for abusive language detection has received only little attention in previous work, the exceptions being Razavi et al. (2010) who present a lexicon generated using adaptive learning, Gitari et al. (2015) who bootstrap hate verbs and Wiegand et al. (2018a) who induce a lexicon of abusive words. This lexicon is currently the best performing lexicon for the task. It has been induced with the help of a (seed) base lexicon which had been manually annotated. The bootstrapping step largely relies on resources that exist only for wellresourced languages, such as WordNet, sentiment intensity datasets or sentiment-view lexicons.
Recently, there has been a general interest in exploiting extralinguistic information for natural language processing. Emoticons, such as :-), have been found useful for sentiment analysis, particularly emotion classification (Purver and Battersby, 2012). Emojis represent an even more fine-grained set of icons. Felbo et al. (2017) exploit them for pretraining neural models to produce a text representation of emotional content. Since this approach relies on a representative sample of tweets containing emojis, only the 64 most frequently occurring emojis are considered. This set, however, does not contain the very predictive emojis for abusive language detection (e.g. middle finger). Corazza et al. (2020) follow an approach similar to Felbo et al. (2017) in that they pretrain a language model with the help of emoji informarion. However, unlike Felbo et al. (2017), their emoji-based masked language model is evaluated for zero-shot abusive language detection. The task is also considered in a multilingual setting: the target languages are English, German, Italian and Spanish. The improvements that Corazza et al. (2020) report over baseline language models that do not explicitly incorporate emoji information are only limited.
Our work extends Felbo et al. (2017) and Corazza et al. (2020) in that we focus on predictive emojis for abusive languag detection. Unlike Felbo et al. (2017) and Corazza et al. (2020), we do not pretrain a text classifier with these additional emojis. Supervised text classifiers are known to severely suffer from domain mismatches in abusive language detection whereas lexicon-based classifi-   (Robbins, 2008), our choice includes emojis that connote violence (Wiener, 1999) ( oncoming fist, pistol), the taboo topics death and defecation (Allen and Burridge, 2006) ( skull and crossbones, pile of poo), the emotions anger and disgust (Alorainy et al., 2018) ( angry face, face vomiting) and dehumanization (Mendelsohn et al., 2020) ( monkey face). (1)-(8) illustrate each emoji with an abusive tweet. For further emojis we only obtained an insufficient amount of English tweets that were necessary for our experiments (i.e. several thousand tweets after running a query containing these emojis using the Twitter-streaming API for a few days). Examples of such sparse emojis are (bomb) connoting violence or (high voltage) connoting anger. 4 Although our procedure involved a manual selection of emojis, in our evaluation we will demonstrate that this choice does not overfit but generalizes across different datasets and languages.
Table 1 also shows that for Portuguese and German we obtained fewer tweets. This sparsity is representative for languages other than English.
Vocabulary. Our induction experiments are carried out on a vocabulary of negative polar expressions. Abusive words form a proper subset of these expressions. We use the set of negative polar expressions from Wiegand et al. (2018a) comprising 3 https://unicode.org/emoji/charts/full-emoji-list.html 4 https://icon-library.com/icon/anger-icon-14.html about 7,000 English words. For our experiments on Portuguese and German data, we created similar word lists following Wiegand et al. (2018a).
Tasks. In this work, there are two types of tasks: lexicon induction tasks in which we rank negative polar expressions where the high ranks should be abusive words, and classification of abusive microposts. The former is evaluated with precision at rank n (P@n), while the latter is evaluated with accuracy and macro-average F-score.
Supervised Micropost Classification with BERT. In many experiments, we employ BERT-LARGE (Devlin et al., 2019) as a baseline for stateof-the-art text classification for detecting abusive microposts. We always fine-tune the pretrained model by adding another layer on top of it. (The supplementary notes contain more details regarding all classifiers employed in this paper.)

Methods for Lexicon Induction
Pointwise Mutual Information (PMI). A standard method for inducing a lexicon from labeled documents is to rank the words according to the PMI with the target class (Turney, 2002). We use tweets in which either of the above emojis occur as abusive documents. In order to obtain negative instances, i.e. tweets which convey no abusive language, we simply sample random tweets from Twitter. The rationale is that abusive language is known to be rare, even on Twitter. Founta et al. (2018) estimate that the proportion of abusive tweets is less than 5%. In order to avoid spurious word correlations, we compute PMI only for words in our vocabulary of negative polar expressions ( §3) which occur at least 3 times in our tweets. This threshold value was proposed by Manning and Schütze (1999). Projection-based Induction. In our second method, we learn a projection of embeddings. The tweets are labeled in the same way as they are labeled for PMI. We use the pretrained embeddings from GloVe (Pennington et al., 2014) induced from Twitter. 5 Projection-based induction has the advantage over PMI that it does not only rank words observed in the labeled tweets but all words represented by embeddings. Since the GloVe embeddings are induced on a very large set of tweets which is about 10,000 times larger than the set of tweets we will later use for projection-based induction, i.e. 100k tweets per class (Table 4), the projection is likely to cover a larger vocabulary than PMI including additional abusive words. Let M = [w 1 ,. . . ,w n ] denote a labeled tweet of n words. Each column w ∈ {0, 1} v of M represents a word in a one-hot form. Our aim is learning a one-dimensional projection S · E where E ∈ R e×v represents our unsupervised embeddings of dimensionality e over the vocabulary size v and S ∈ R 1×e represents the learnt projection matrix. We compute a projected tweet h = S · E · M which is an n-dimensional vector. Each component represents a word from the tweet. The value represents the predictability of the word towards being abusive. We then apply a bag-of-words assumption to use that projected tweet to predict the binary class label y: p(y|M) ∝ exp(h·1) where 1 ∈ {1} n . This model is a feed-forward network trained using Stochastic Gradient Descent (Rumelhart et al., 1986). On the basis of the projected embeddings we rank the negative polar expressions from our vocabulary ( §3). Recall-based Expansion by Label Propagation (LP). While the very high ranks of an induction method typically coincide with the target class (in our case: abusive words), the lower a rank is, the more likely we are to encounter other words. Taking the high ranks as abusive seeds and then applying some form of label propagation on a wordsimilarity graph may increase the overall coverage of abusive words found. More specifically, we apply the Adsorption label propagation algorithm from junto (Talukdar et al., 2008) on a wordsimilarity graph where the words of our vocabulary are nodes and edges encode cosine-similarities of their embeddings. As negative (i.e. non-abusive) seeds, we take the most frequently occurring words from our vocabulary since they are unlikely to represent abusive words. In order to produce a meaningful comparison to PMI and projection-based induction, we need to convert the categorical output of label propagation to a ranking of our entire vocabulary. We achieve this by ranking the words pre- 5 We take the version with 200 dimensions which is a very frequently used configuration for word embeddings. dicted to be abusive by their confidence score. At the bottom we concatenate the words predicted to be non-abusive by their inverted confidence score.

Experiments on English
Evaluation of Induction. The first question we want to answer is what emoji is most predictive. For each of our pre-selected emojis (Table 1), we sampled 10k tweets in which it occurs and ranked the words of our vocabulary according to PMI. As non-abusive tweets we considered 10k randomly sampled tweets. As a baseline, we randomly rank words (random). As a gold standard against which we evaluate our rankings, we use all words of the lexicon from Wiegand et al. (2018a) predicted as abusive. Table 2 shows the results of the evaluation against this gold standard. The table shows that (middle finger) is the strongest emoji. This does not come as a surprise as the middle finger is universally regarded as a deeply offensive gesture. We use this emoji as a proxy for abusive language in all subsequent experiments where possible.
In Table 3, we examine for PMI and our projection-based approach whether the ranking quality can be improved when more tweets are used. We increased the number of tweets containing the emoji and the number of negative tweets to 100k each. Using the free Twitter-streaming API larger amounts cannot be crawled in a reasonable time span (e.g. 1 month). While for projection, we reach maximum performance at 10k tweets, PMI is dependent on more data since it can only rank words it has actually observed in the data. projection clearly outperforms PMI. Since we do not want to overfit and show that our approach is not dependent on the exact value of 10k but also works with any amount of tweets beyond 10k, we use 100k tweets (i.e. the largest amount of tweets available to us) in subsequent experiments. Table 4 compares further methods. Our gold standard has a wide notion of abusive language, including words such as crap or shit, which may be merely profane, not truly abusive. Such words also occur in the random tweets that serve as negative data. (Recall that profanity is much more common on Twitter.) These words are thus not learned as abusive. We therefore replaced our negative data with a random sample of sentences from the English Web as Corpus (ukwac). While we thus preserve the language register with this corpus, i.e. informal language, profane language should be-    come exclusive to our proxy of abusive tweets. Table 4 confirms that using ukwac as negative data (projection ukwac ) improves performance.
To increase the recall of abusive words, we apply LP ( §4.1) to the output of projection ukwac . Since label propagation is sensitive to the underlying class distribution and abusive words typically represent the minority class, we use twice as many non-abusive seeds as abusive seeds. 6 We vary the amount of abusive seeds between 100, 200 and 500.
To ensure comparability to the remaining configurations, the seeds are prepended to the output of LP (which explains that LP has only an impact on lower ranks). Table 4 shows clearly that LP outperforms projection ukwac on lower ranks.
Cross-Domain Evaluation. Next, we test the best lexicon of our previous experiments (i.e. projection ukwac +LP(200 abusive seed words)) in cross-domain micropost classification. Posts are categorized into abusive and non-abusive posts. Through a cross-domain classification, in which we train on one dataset and test on another, we show that the chosen configuration is not overfit to a particular dataset. Table 5 provides some information on the datasets we consider. In addition to the datasets used in Wiegand et al. (2018a), we include the recent SemEval-dataset from Zampieri et al. (2019). Table 6 shows the results of cross-domain micropost classification. As baselines we use a majorityclass classifier, the feature-based approach from Nobata et al. (2016), BERT and the lexicon from Wiegand et al. (2018a). In order to demonstrate 6 We refrain from tuning the ratio in order to improve the result of LP since we want to avoid overfitting. the intrinsic predictiveness of the words learned by our emoji-based approach, we do not train a classifier on the source domain (unlike Wiegand et al. (2018a) who use the rank of the lexicon entries as a feature) but simply classify a micropost as abusive if an abusive word from our emoji-based lexicon is found. As abusive words, we consider all 1,250 words of our best approach (Table 4) predicted as abusive. Since the training data are not used for our emoji-based approach, that approach produces always the same result on each test set. Table 6 shows that our lexicon performs on a par with the induction method from Wiegand et al. (2018a), on some domains (e.g. Warner), it is even better. Our observation is that these slight performance increases can be ascribed to the fact that our lexicon is only half of the size of the lexicon from Wiegand et al. (2018a). That lexicon still contains many ambiguous words (e.g. blind or irritant) that are not included in our emoji-based lexicon. Notice that our aim was not to outperform that method. The underlying lexicon was bootstrapped using manual annotation and the induction depends on external resources, such as WordNet or sentiment intensity resources. Our emoji-based approach is a much cheaper solution that can also be applied to languages where these resources are lacking.

Crosslingual Experiments
In order to show that our approach is also useful for languages other than English, we now apply it to Portuguese and German data.
Necessary Modifications. Given that there are much fewer Portuguese and German than English tweets (Table 1), it is more difficult to obtain a sim-   ilar amount of tweets containing the middle-finger emoji for these languages. Despite the fact that our previous experiments (Table 3) suggest that a smaller amount of data is sufficient for projection (i.e. 10k tweets), it would still take more than 2 months to obtain such an amount of German tweets containing the middle finger (Table 1). In order to obtain 10k Portuguese and German tweets more quickly, we included tweets with other predictive emojis. We extracted tweets containing one of the 4 most predictive emojis: face vomiting, pile of poo, angry face or middle finger. These 4 emojis are drawn from our English data (Table 1) in order to further demonstrate crosslingual validity. The distribution of emojis reflects their natural distribution on Twitter. For non-abusive text we sampled sentences from the Portuguese and German versions of the Web As Corpus (Baroni et al., 2009;Filho et al., 2018) from which we also induced word embeddings with word2vec (Mikolov et al., 2013). We decided against pre-trained Twitter embeddings since for many languages such resources are not available. We opted for a setting applicable to most languages.
Evaluation. We evaluate our emoji-based lexicons on the Portuguese dataset from Fortuna et al. (2019) and the two German datasets from Germ-Eval (Wiegand et al., 2018b;Struß et al., 2019). These are datasets for the classification of abusive microposts. As in our evaluation on English data (Table 6), we refrain from an in-domain evaluation since again we want to avoid topic/author biases ( §1). Instead, lexicon-based classifiers and a crosslingual approach are used as baselines. The former classifiers predict a micropost as abusive if one abusive word according to the lexicon has been found. In addition to the two variants of hurtlex (Bassignana et al., 2018), hl-conservative and hl-inclusive, we use a lexicon following the method proposed by Wiegand et al. (2018a) on German (Wiegand2018-replic). The latter method cannot be replicated for Portuguese, since essential resources for that approach are missing (e.g. sentiment intensity resources, sentiment view lexicons, a manually annotated base lexicon). Moreover, we consider Wiegand-translated, which is the English lexicon from Wiegand et al. (2018a) translated to the target language via GoogleTranslate 7 . Unlike Wiegand-replic, this lexicon is cheap to construct as it only requires the original English lexicon.
Our crosslingual baseline exploits the abundance of labeled training data for abusive language detection on English and neural methods to close the language gap between English and the target language. We use multilingual BERT in which English, Portuguese and German share the same representation space. As proposed by Pires et al. (2019), we train a text classifier on an English dataset for abusive language detection and test the resulting multilingual model on the Portuguese or German microposts. The model that is learnt on English should be usable on the other languages as well, since the three languages share the same representation space. Our crosslingual approach is trained on the dataset from Zampieri et al. (2019), which like our non-English datasets originates from Twitter. Table 7 shows the results. We also added an upper bound for our emoji-based approach (emo-ji+manual) in which we also include abusive words manually extracted from the abusive microposts missed by the emoji-based approach. Table 7 suggests that our emoji-based approach is only slightly outperformed by its upper bound and the replicated lexicon from Wiegand et al. (2018a), which depends on expensive resources that do not exist in many languages. It is also interesting that the trans-   lated lexicon from Wiegand et al. (2018a) is notably worse than the replicated lexicon. We found that there is a substantial amount of abusive words which cannot be translated into the target language for lack of a counterpart. For example, spic refers to a member of the Spanish-speaking minority in the USA. This minority does not exist in most other cultures. For such entries, GoogleTranslate produces the original English word as the translation. In our translated German lexicon, 33% of the entries were such cases. Similarly, we expect some abusive words in German to lack an English counterpart. Therefore, induction methods employing data from the target langage, such as the replicated lexicon or our emoji-based approach, are preferable to translation.

Disambiguation of Abusive Words
Many potentially abusive words are not meant to be abusive, i.e. deliberately hurt someone, in all situations in which they are used. For instance, the word fuck is abusive in (9) but it is not in (10).
(9) @USER Remorse will get you nowhere, sick fuck. (10) It's so hot and humid what the fuck I'm dying While operators of social media sites are increasingly facing pressure to react to abusive content on their platforms, they are not necessarily targeting profane language as in (10). In fact, users may see advances of operators against their profane posts as unnecessary and as an infringement of their freedom of speech. Therefore, automated methods to filter textual content of social media sites should ideally distinguish between abusive and profane usage of potentially abusive words.

Disambiguation with the Help of Emojis
While much previous work (e.g. Davidson et al. (2017)) may frame this task as simply another text classification task in abusive language detection, we consider this as a word-sense disambiguation task. As a consequence, we argue that for robust classification, it is insufficient to have as labeled training data just arbitrary utterances classified as abuse and mere profanity. Instead, as we will also demonstrate, training data have to comprise mentions of those potentially abusive expressions that also occur in the test data. Such an undertaking is very expensive if the training data are to be manually annotated. We propose a more inexpensive alternative in which emojis are employed. We consider tweets containing potentially abusive words as abusive training data if they co-occur with the middle-finger emoji (11)-(14).
(11) @USER Mind ur own business bitch (12) @USER I have self pride unlike u bastard bitch (13) @USER Coming from the fake as fuck president lol (14) @USER @USER How about you fuck off Hector! Given the scarcity of abusive language even on Twitter (Founta et al., 2018), we consider plain tweets that contain this target word as negative (non-abusive) training data (15) The supervised classifier we design is a featurebased classifier (SVM). Holgate et al. (2018) report that on the fine-grained classification of (potentially) abusive words such an approach outperforms deep learning methods. We employ an even more lightweight feature set to show that simple features may already help in this task. Table 8 displays our feature set.

Evaluation of Disambiguation
For evaluation we created a gold standard in which mentions of the two frequent but ambiguous abusive words fuck and bitch occur (Table 9). We chose these particular two words because they are the only abusive words that are both sufficiently ambiguous and frequent on the dataset from Holgate et al. (2018). That dataset was the only existing dataset with word-specific annotation that was available to us at the time we carried out our experiments so that we could use it as one baseline. 8 For each of the two words, we extracted 1,000 tweets in which it occurs and had them annotated via crowdsourcing (ProlificAcademic 9 ). Each tweet was annotated as abusive or profane based on the majority of 5 annotators (native speakers of English). (The supplementary notes contain the annotation guidelines.) 8 Meanwhile, two further datasets by Pamungkas et al. (2020) and Kurrek et al. (2020) have been made publicly available which might also be suitable for the kind of evaluation we present in our work. 9 www.prolific.co

Baselines for Disambiguation
Text Classification. We train a supervised text classifier (BERT) on each of the following two large datasets (containing several thousand microposts) manually annotated on the micropost level. The dataset from Davidson et al. (2017) distinguishes between the 3 classes: hate speech, offensive language and other. The first category matches our definition of abusive language whereas the second category resembles our category of profane language. We train our classifier on these two categories. The Kaggle-dataset 10 has a more finegrained class inventory, and the class insult can be best mapped to our definition of abusive language. Since profane language can be found in all of the remaining classes, we use the microposts of all other classes as training data for our second class. Word-specific Classification. We consider the fine-grained class inventory from the manually annotated dataset introduced by Holgate et al. (2018). Unlike the previous baseline, which consists of micropost-level annotation, this dataset contains word-specific annotation, i.e. potentially abusive words annotated in context. This allows us to reduce the training data to contain exclusively contextual mentions of either of our target words (i.e. bitch and fuck). We use the class express aggression as a proxy for our class of abuse while all other occurrences are used as merely profane usages. Given that we have word-specific training data, we train an SVM-classifier on the disambiguation features from Table 8 as we do with our proposed classifier ( §5.1).
Heuristic Baseline. In this baseline, training data for abusive usage is approximated by tweets containing the target word and a username. The rationale is that abuse is always directed against a person and such persons are typically represented by a username in Twitter. As profane training data, we consider tweets containing the target word but lacking any username. Given that we have word-specific training data, we again train an SVMclassifier on the disambiguation features (Table 8). Table 10 shows the result of our evaluation. For our emoji-based method (and the heuristic baseline), we trained on 2,000 samples containing mentions of the respective target word. Further data did not feature explanation words immediately preceding and following target word may be helpful in order to learn phrases such as fuck off; larger context is avoided since we are likely to overfit to particular domains presence of abusive words in context? target word is likely to be abusive if it co-occurs with other (unambiguously) abusive words; abusive words are identified with the help of the lexicon from Wiegand et al. (2018a) presence of positive/negative polar expressions in context? positive polar expressions rarely co-occur with abusive language, negative polar expressions, however, do; the polar expressions are obtained from the Subjectivity Lexicon (Wilson et al., 2005) which pronouns are in context? 2nd person pronouns are typical of abusive usage: you are a bitch; 1st person pronouns are likely to indicate non-abusive usage: I am a bitch quotation signs in tweet?

Results of Disambiguation
quotation signs indicate reported speech; a tweet may report an abusive remark, however, the reported remark itself may not be perceived as abusive (Chiril et al., 2020) presence of exclamation sign? a typical means of expressing high emotional intensity  improve performance. Our proposed approach outperforms all other classifiers with the exception of the more expensive word-specific classifier on the disambiguation of fuck. These results show that emojis can be effectively used for disambiguation. Since we considered the classifier trained with word-specific annotation an upper bound, we were surprised that our emoji-based classifier outperformed that approach on the disambiguation of bitch. In that training data we found abusive instances that, according to our guidelines (see supplementary notes), would not have been labeled as abusive (19)-(20). These deviations in the annotation may be the cause of the lower performance.
(19) Wow now im a bitch and its apparently ALWAYS like this. Im ready to be over tonight. (20) I am many things -but a boring bitch is not one.
The baseline text classification is less effective than word-specific classification. Our inspection of the former datasets revealed that their annotation is less accurate. Apparently annotators were not made aware that certain words are ambiguous. As a consequence, they seem to have used specific words as a signal for or against abuse. For instance, on the Davidson-dataset, almost all occurrences of bitch (> 97%) are labeled as abuse and almost all occurrences of fuck (> 92%) as no abuse.

Conclusion
We presented a distant-supervision approach for abusive language detection. Our main idea was to  exploit emojis that strongly correlate with abusive content. The most predictive emoji is the middlefinger emoji. We employed mentions of such emojis as a proxy for abusive utterances and thus generated a lexicon of abusive words that offers the same performance on cross-domain classification of abusive microposts as the best previously reported lexicon. Unlike that lexicon, our new approach neither requires labeled training data nor any expensive resources. We also demonstrated that emojis can similarly be used in other languages where they outperform a crosslingual classifier and a translated lexicon. Finally, we showed that emojis can also be used to disambiguate mentions of potentially abusive words.