Cross-Platform and Cross-Domain Abusive Language Detection with Supervised Contrastive Learning

The prevalence of abusive language on different online platforms has been a major concern that raises the need for automated cross-platform abusive language detection. However, prior works focus on concatenating data from multiple platforms, inherently adopting Empirical Risk Minimization (ERM) method. In this work, we address this challenge from the perspective of domain generalization objective. We design SCL-Fish, a supervised contrastive learning integrated meta-learning algorithm to detect abusive language on unseen platforms. Our experimental analysis shows that SCL-Fish achieves better performance over ERM and the existing state-of-the-art models. We also show that SCL-Fish is data-efficient and achieves comparable performance with the large-scale pre-trained models upon finetuning for the abusive language detection task.


Introduction
Abusive langugage is defined as any form of microaggression, condescension, harassment, hate speech, trolling, and the like (Jurgens et al., 2019).Use of abusive language online has been a significant problem over the years.Although a plethora of works has explored automated detection of abusive language, it is still a challenging task due to its evolving nature (Davidson et al., 2017;Müller and Schwarz, 2017;Williams et al., 2019).In addition, a standing challenging in tackling abusive language is linguistic variation as to how the problem manifests itself across different platforms (Karan and Šnajder, 2018;Swamy et al., 2019;Salminen et al., 2020).
We provide examples illustrating variation of abusive language on different platforms in Figure 1. 1 For example, user comments in broadcasting media such as Fox News do not directly contain any strong words but can implicitly carry abusive messages.Meanwhile, people on social media such as on Twitter employ an abundance of strong words that can be outright personal bullying and spread of hate speech.On an extremist public forum such as Gab, users mostly spread abusive language in the form of identity attacks.For these reasons, it is an unrealistic assumption to train an abusive language detector on data from one platform and expect the model to exhibit equally satisfactory performance on another platform.
Prior Works on cross-platform abusive language detection (Karan and Šnajder, 2018;Mishra et al., 2018;Corazza et al., 2019;Salminen et al., 2020) usually concatenate examples from multiple sources, thus inherently applying Empirical Risk Minimization (ERM) (Vapnik, 1991).These models capture platform-specific spurious features, and lack generalization (Shi et al., 2021).Fortuna et al. (2018), on the other hand, incorporate out-ofplatform data into training set and employ domainadaptive techniques.Other works such as Swamy et al. (2019) and Gallacher (2021) develop one model for each platform and ensemble them to improve overall performance.
None of the prior works, however, attempt to generalize task-oriented features across the platforms to improve performance on an unseen platform.In this work, we introduce a novel method for learning domain-invariant features to fill this gap.Our approach initially adopts an first-order derivative of meta-learning algorithm (Andrychowicz et al., 2016;Finn et al., 2017), Fish (Shi et al., 2021), that attempts to capture domain-invariance.We then propose a supervised contrastive learning (SCL) (Khosla et al., 2020) to impose an additional constraint on capturing task-oriented features that can help the model to learn semantically effective embeddings by pulling samples from the same class close together while pushing samples from opposite classes further apart.We refer to our new method as SCL-Fish and conduct extensive experiments on a wide array of platforms representing social networks, public forums, broadcasting media, conversational chatbots, and syntheticallygenerated data to show the efficacy of our method over other abusive language detection models (and specially ERM that prior works on cross-platform abusive language detection applied).
To summarize, we offer the following contributions in this work: 1. We propose SCL-Fish, a novel supervised contrastive learning augmented domain generalization method for cross-platform abusive language detection.
2. Our method outperforms prior works on crossplatform abusive language detection, thus demonstrating superiority to ERM (the core idea behind these previous models).Additionally, we show that SCL-Fish outperforms platform-specific state-of-the-art abusive/hate speech detection models.
3. Our analysis reveals that SCL-Fish can be data-efficient and exhibit comparable performance with the state-of-the-art models upon finetuning on the abusive language detection task.
2 Related Works

What is Abusive Language?
The boundary between hate speech, offensive, and abusive language can be unclear.Davidson et al. (2017) define hate speech as "language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group"; whereas,.Zampieri et al. (2019a) define offensive language as "any form of non-acceptable language (profanity) or a targeted offense, which can be veiled or direct".
In this paper, we adopt the definition of abusive language provided by Jurgens et al. (2019) and consider both offensive and hate speech as abusive language in general, since distinguishing between offensive and hate speech is often deemed as subjective (Sap et al., 2019;Koh et al., 2021).

Domain Generalization
In the domain generalization task, training and test sets are sampled from different distributions (Quiñonero-Candela et al., 2008).In recent years, domain-shifted datasets have been introduced by synthetically corrupting the samples (Hendrycks and Dietterich 2019, Xiao et al. 2020, Santurkar et al. 2020).To improve the capability of a learner on distributional generalization, Vapnik (1991) proposes Empirical Risk Minimization (ERM) approach which is widely used as the standard for the domain generalization tasks (Koh et al. 2021).ERM concatenates data from all the domains and focuses on minimizing the average loss on the training set.However, Pezeshki et al. (2021) state that a learner can overestimate its performance by capturing only one or a few dominant features with the ERM approach.Several other algorithms have been proposed to generalize models on unseen domains.Sagawa et al. (2019) attempt to develop distributionally robust algorithm, where the domain-wise losses are weighted inversely proportional to the domain performance.Krueger et al. (2021) propose to minimize the variation loss across the domains during the training phase and Arjovsky et al. (2020) aim to penalize the models if the performance varies among the samples from the same domain.

Contrastive Learning
Contrastive learning aims to learn effective embedding by pulling semantically close neighbors together while pushing apart non-neighbors (Hadsell et al. 2006).This method uses cross-entropybased similarity objective to learn the embedding representation in the hyperspace (Chen et al., 2017;Henderson et al., 2017).In computer vision, Chen et al. (2020) proposes a framework for contrastive learning of visual representations without specialized architectures or a memory bank.Khosla et al. (2020) shows that supervised contrastive loss can outperform cross-entropy loss on ImageNet (Russakovsky et al., 2015).In NLP, similar methods have been explored in in the context of sentence representation learning (Karpukhin et al., 2020;Gillick et al., 2019;Logeswaran and Lee, 2018).Among of the most notable works, Gao et al. (2021) proposes unsupervised contrastive learning framework, SimCSE that predicts input sentence itself by augmenting it with dropout as noise.
Although inclusion of neural network architecture improves the performance (Mitrović et al., 2019;Kshirsagar et al., 2018;Sigurbergsson and Derczynski, 2020), the models still misclassify a large number of samples in false-positive and falsenegative categories when abusive language is intentionally manipulated (Gitari et al., 2015).Recently, Transformer-based (Vaswani et al., 2017) architectures like BERT (Devlin et al., 2018), RoBERTa (Liu et al., 2019b) have been introduced in the abusive language detection task (Liu et al., 2019a;Swamy et al., 2019).However, most of the prior works on abusive language detection focus on a single platform due to the inaccessibility to multiple platforms (Vidgen and Derczynski, 2020) and thus, do not scale well on other platforms Schmidt and Wiegand (2017).As a result, the models are not suitable to apply to other platforms due to the lack of generalization (Karan and Šnajder, 2018;Gröndahl et al., 2018).In this work, we aim to address this challenge by introducing an augmented domain generalization method that captures task-oriented domaingeneralized features across multiple platforms.

Challenge & Proposed Solution
As shown in Figure 1, the nature of offensive language can vary from one platform to another.Therefore, it is important to design a model that can capture platform-generalized representations.This inspires us to adopt a domain-generalization algorithm that can maximize feature generalization while avoiding dependence on domain-specific, spurious features.To learn platforminvariant features, we adopt first-order derivative of Inter-domain Gradient Matching (IDGM) Shi et al. (2021), a Model Agnostic Meta-Learning (MAML) (Andrychowicz et al., 2016;Finn et al., 2017), algorithm, Fish, that aims to reduce sample complexity of new, unseen domains and increase domain-generalized feature selection across those domains.However, if we look at Figure 2, unlike the domain-generalization task, representation of abusive language across the platforms is more overlapping and scattered.Thus, the model should also learn some platform-specific and overlapping features that can help to capture task-oriented representations.Therefore, we need to impose a constraint on the learning objective of the model so that in one direction, it should learn platform-invariant features for better generalization, and in the other direction, it should also learn only those task-oriented overlapping features that pass positive signals to those platform-generalized features for the abusive language detection task.
To learn task-oriented features we introduce SCL-Fish, method for supervised contrastive learning (SCL) (Khosla et al., 2020) with Fish.The rationale behind integrating SCL is that we seek to find commonalities between the examples of each class (abusive/normal) irrespective of the platforms and contrast them with examples from the other class.

SCL-Fish
Assuming we have a training dataset of abusive language detection consisting of samples from two platforms P 1 and P 2 where P k = {(X k , Y k )}.Given a model θ and loss function l , the empirical risk minimization (ERM) (Vapnik, 1991) objective is to minimize the average loss across the given platform: δl ((x, y); θ) δθ The expected gradients for these two platforms are expressed as If the directions of G 1 and G 2 are same (G 1 .G 2 > 0), then we can say that the model is improving on both platforms.Therefore, IDGM algorithm attempts to align the direction of the gradients G 1 and G 2 by maximizing their inner dot product.Hence, given the total number of training platforms S, the final objective function of IDGM is obtained by subtracting gradient dot product (GIP) from ERM loss: Here, γ is a scaling term and GIP can be computed in linear time by However, the derivation of G is computationally expensive, as it is a dot product of two gradients.Adopting from Nichol et al. (2018), Shi et al. (2021) work around this issue by proposing a first-order derivative version of IDGM, namely, Fish.Shi et al. (2021) show that given the gradient of ERM Ḡ and a clone of original model θ, In other words, if we ignore the ERM objective, we can substitute the second-order derivative G g with a computationally less expensive G f .
Although, this method exhibits impressive performance on the domain-generalization task, as mentioned in Section 3.1, it may capture only platform-invariant features without much focus on task-relevant features.To overcome this issue, we augment Fish with a supervised contrastive learning (SCL) objective, which will teach the model to select the features such that the representation of an abusive sample and a non-abusive sample are located far from each other in the hyperspace, Here, f (.) is an encoder and N is the number of samples summing all the platforms.Therefore, the model will be encouraged to learn only those task-oriented features that are invariant across the platforms and can be used to distinguish abusive and non-abusive examples.
Algorithm 1 SCL-Fish for P i ∈ {P 1 , P 2 , ..., P S } do for Sample minibatch p scl ∼ P scl do 14: Calculate gradient for SCL from (3): 15: end for 19: end for We present SCL-Fish in Algorithm 1.For each training platform, Fish performs inner-loop (l3-l8) update steps with learning rate α on a clone of the original model θ in a minibatch.Subsequently, the original model θ is updated by a weighted difference between the cloned model and the original model θ − θ.After performing, platformgeneralized update, the trained samples of this iteration(l12) are queued and sampled in a minibatch * Twitter dataset is collected from Waseem and Hovy (2016), Davidson et al. (2017), Jha and Mamidi (2017) to update θ with supervised contrastive loss (l13-l18).

Experiments 4.1 Datasets
To experiment with the efficacy of SCL-Fish, we compile datasets from a wide range of platforms.We collect source of the datasets primarily from (Risch et al., 2021) and (Vidgen and Derczynski, 2020).We provide meta-information of the datasets in Table 1.Description of each dataset is presented in Appendix F.

Methods Comparison
We compare performance of SCL-Fish with Fish, also using ERM as a sensible baseline.We also conduct experiments on an SCL version of ERM (SCL-ERM).Additionally, we compare SCL-Fish with two of the benchmark models for abusive/hate speech detection, HateXplain (Mathew et al., 2021) and HateBERT (Caselli et al., 2021).HateXplain is finetuned on hate speech detection datasets collected from Twitter and Gab 2 for a three-class classification (hate, offensive, or normal) task.It incorporates human-annotated explainability with BERT to gain better performance by reducing unintended bias towards target communities.While conducting our experiments, we consider both hate and offensive classes as one category (abusive).HateBERT pre-trains BERT with Masked Language Modeling (MLM) objective on more than one million 2 https://gab.comoffensive and hate messages from banned Reddit community.It results in a shifted BERT model that has learned language variety and hate polarity (e.g.hate, abuse).Finetuning on different abusive language detection tasks has shown that HateBERT achieves the best/comparable performance.

Experimental Setup
We train the models (ERM, SCL-ERM, Fish, and SCL-Fish) on fb-yt, twitter, and wiki datasets (inplatform datasets) and use stromfront as validation set.We use the same hyperparameters on all the models for fair comparisons.We present the list of hyperparameters in Appendix A. The rest of the datasets from Table 1 are used for cross-platform evaluation.As evident from Table 1, the datasets are highly imbalanced.Hence, we report F 1 -score for abusive class (we denote it as positive-F 1 ) and macro-averaged F 1 -score.For completeness, we also provide performance in accuracy.We train and evaluate our models on Nvidia A100 40GB GPU.

Results on Cross-Platform Datasets
We show results of our models for cross-platform performance in Table 2.We observe that SCL-Fish outperforms other methods in macro-F 1 and positive-F 1 scores while maintaining comparable performance with the best method on the other datasets (reddit, hatecheck).In overall average performance, SCL-Fish achieves best macro-F 1 and positive-F 1 scores.More specifically, user comments on broadcasting media (Fox News), SCL-Fish achieves a gain of 3.2% positive-F 1 and 0.5% macro-F 1 over the other methods.On public forums (Youtube and Reddit), SCL-Fish achieves a total gain of 2.0% in positive-F 1 but SCL-ERM outperforms SCL-Fish by 1.3% in macro-F 1 score.On AI bot conversation (CarbonBot and ELIZA), SCL-Fish achieves a gain of 1.4% positive-F 1 and 1.0% macro-F 1 over other methods.On the syntheticallygenerated platform (HateCheck), ERM outperforms SCL-Fish by 1.2% in positive-F 1 score and Fish outperforms SCL-Fish by 0.1% in macro-F 1 score.On Gab, all the methods (ERM and Fishbased, including SCL-Fish) achieve high positive-F 1 score because of the highly imbalanced dataset.Hence, for a fair comparison among all methods, we report performance on sampled balanced datasets in Appendix B. We also discuss the performance on the in-platform datasets in Appendix C. Most notably, HateBERT achieves the highest macro-F 1 scores on reddit, which is expected since HateBERT is pre-trained on reddit and so has an advantage over other methods since these are trained on data from other platforms.However, all the models including HateXplain and HateBERT are trained on the datasets from Twitter platform.Hence, we analyze performance of the models on twi-fb dataset.Our rationale is that although twifb involves data from Twitter and Facebook, these data do not necessarily have the same distribution as data used to train all the models.The distribution of datasets from the same platform can still defer due to the variations in the timestamps, topics, locations, demographic attributes (e.g.age, race, gender, ethnicity).Although it is not possible to extract all this information from the textual contents, we provide a quantitative comparison between in-domain and out-domain datasets for Twitter in Appendix D. We refer the readers to Koh et al. ( 2021) for more detailed analysis.We find that performance of the models deteriorates significantly (under 56% macro-F 1 ) even on datasets from overlapping platforms but of different distributions.This demonstrates effect of distribution shift in the data, even if we train on date from the same platform.We further discuss possible rationales for this performance gap across the platforms in Appendix E.

Analysis
In this section, we conduct qualitative and quantitative analysis on the experimental results.

Diversity over Quantity
It is worth noting that HateBERT has been pretrained on 1, 478, 348 Reddit messages, almost five times more data than SCL-Fish.However, as Table 2 shows, performance of HateBERT on crossplatform datasets suffers significant drops which is not the case for SCL-Fish.Even on yt-reddit dataset, which is collected from Youtube and Reddit (the latter being the platform whose data Hate-BERT is trained on), HateBERT fails to outperform the baseline ERM method.This shows that, for the purpose of creating platform/domain-invariant models, it is more important to employ training data with different distributions than simply using huge amounts of training data from the same platform but that may have limited distribution.

Finetuning SCL-Fish
Since we show SCL-Fish exhibits better performance than other methods on most of the crossplatform datasets, we further investigate whether the platform-generalization capability of SCL-Fish helps it improve performance on a specific platform (Twitter) upon finetuning.For this purpose, we use two benchmark datasets, namely, OLID (Zampieri et al., 2019a) dataset from SemEval-2019 Task 6 (Zampieri et al., 2019b) and AbusEval (Caselli et al., 2020).Please note that we use OLID dataset for training our methods (Appendix F).Now we are finetuning with the same dataset for this experiment.
We present results for this set of experiments in Table 3 Although SCL-Fish gets a lower score than NULI 3 , SCL-Fish outperforms BERT and HateBERT on both in positive-F 1 and macro-F 1 .This is important because HateBERT uses five times more data from one specific platform (Reddit).This proves that our proposed SCL-Fish is useful not only in platform generalized zero-shot setting but also for finetuning, and emphasizes the importance of diversity of the data (which translates into varied distributions) over data size.
For AbusEval dataset, SCL-Fish performs better than BERT and the prior work (Caselli et al., 2020), but it cannot outperforms HateBERT.We suspect that the reason is due to the different annotation process followed during the earlier training phase of SCL-Fish and HateBERT.Because, although OLID and AbusEval contain identical tweets in the training and the testing sets, the annotation scheme of AbuseEval is different from OLID.While Zampieri et al. (2019a) uses the definition of offensive language as "Posts containing any form of non-acceptable language (profanity) or a targeted offense, which can be veiled or direct" to annotate OLID dataset, Caselli et al. (2020) uses the definition of abusive language as "hurtful language that a speaker uses to insult or offend another individual or a group of individuals based on their personal qualities, appearance, social sta-3 Please note that Caselli et al. (2021) reports positive-F1 of NULI as 59.9% which is lower than positive-F1 of SCL-Fish.But the positive-F1 we compute from Liu et al. (2019a) is different from the one reported in Caselli et al. (2021).Therefore, we consider our computed positive-F1 for NULI.tus, opinions, statements, or actions" to annotate AbusEval dataset.More comprehensively, Abu-sEval excludes any kind of untargeted messages from the hate speech category.During the training phase of SCL-Fish, we consider any targeted or non-targeted strong language as offensive.Therefore, finetuning on AbusEval causes misalignment with the earlier training phase of SCL-Fish, and may result in performance deterioration.We investigate how platform generalization helps the model attend to the right context on 'outof-platform' datasets.For this purpose, we analyze attention vectors of SCL-Fish, HateXplain, and HateBERT in an attempt to better understand their performance.We use BertViz (Vig, 2019) to compute and visualize the final layer attention vectors from [CLS] to other tokens.We select three out-of-platform datasets (fox, stormfront, and hate-Check) and randomly sample one abusive example from each where SCL-Fish correctly identifies the example as abusive, but HateXplain and HateBERT misclassify it.Figure 3 shows the attention visualization for each of the examples.As we can see, in the example from Fox News user comments, although the text does not explicitly contain any strong or offensive words, it is seemingly offensive towards 'Muslims' and 'Merkel'.Hence, our models should attend to these two words with the highest priority, which SCL-Fish does.On the other hand, although HateXplain gives higher attention to 'Merkel', it fails to attend the word 'Muslims'.Surprisingly, HateBERT does not assign priority to any context for the misclassified examples.On the example from StormFront, both SCL-Fish and Hat-eXplain, correctly assign priority to the words 'foreigners' and 'pegan' unlike HateBERT.However, HateXplain also confuses other words e.g.'The' as a highly prioritized token.Finally, the example from synthetically-generated dataset hateCheck is challenging because of the linguistic complexity (e.g.negations, hedging terms) language models typically struggle to address (Hossain et al., 2020;Ettinger, 2020;Kassner and Schütze, 2020).We observe that SCL-Fish highly prioritizes 'women' and also attends to the token 'not'.On the other hand, HateXplain mistakenly provides the highest attention to 'We must' and ignores the negation term 'not'.

Explainability with Attention Visualization
Overall, our analysis shows that model trained on platform-generalized settings improves on identifying the targeted community and right context on an out-domain offensive text.On the contrary, platform-specific models may not be able to attend to the targeted community in a different platform, because these models are trained on target specific to particular platforms.

SCL Improves Fish
From Table 2 and Table 7, it is evident that integrating SCL with Fish empirically improves performance across the platforms.Now, we substantiate the empirical result with the visual justification for Fish and SCL-Fish on different platforms.For all the platforms, we pass an equal number of abusive and non-abusive samples to the models and plot the [CLS] embeddings in Figure 4. We observe that, SCL-Fish forms more compact clusters of abusive (majority from orange samples) and non-abusive (majority from blue samples) examples than Fish.Supervised contrastive learning attempts to learn task-oriented features that help bring representations of the same class closer to each other while pushing representations of different classes further apart.As a result, distinct clusters are formed for each class in Figure 4. Therefore, incorporating SCL helps Fish reduce the confusion between abusive and non-abusive representations and improves overall performance of the model.

Limitations
Although SCL-Fish achieves improvement over Fish, training SCL-Fish takes longer time than Fish.Empirically, we find that SCL-Fish is approximately 1.2x slower than Fish.Moreover, we believe that the subjective nature of abusive language (Sap et al., 2019) affects the annotation process of different datasets and possibly negatively impact performance.We conduct an error analysis in Appendix G.

Conclusion
In this work, we addressed the problem of crossplatform abusive language detection from the domain generalization perspective.We proposed SCL-Fish, a supervised contrastive learning augmented meta-learning method to learn generalized task-driven features across platforms.We showed that SCL-Fish achieves better performance compared to the other state-of-the-art models and models adopting ERM for cross-platform abusive language detection.Our analysis also reveals that SCL-Fish achieves comparable performance on finetuning with much smaller data for crossplatform training than other data-intensive methods.Our work demonstrates progress on both platform and domain generalization in the context of abusive language detection, which we hope future research can be extended to other areas of language understanding.Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019b.SemEval-2019 task 6: Identifying and categorizing offensive language in social media (Of-fensEval).In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 75-86, Minneapolis, Minnesota, USA.Association for Computational Linguistics.

A Hyperparameter Configuration
The detailed configuration of hyperparameters for the training phase of the cross-platform experiments is shown in Table 4.We run each experiment three times and report the average performance of the models.

B Performance on Cross-Platform Balanced Datasets
We sample an equal number of examples from abusive and normal classes for each dataset.The result is shown in Table 6.

C In-Platform Performance
Table 7 shows the performance of the methods on the in-platform datasets.Unsurprisingly, ERMbased methods outperform Fish-based methods on all the datasets and in all the metrics.ERM method learns platform-specific features, while the Fishbased method tends to learn platform-invariant features.Therefore, evaluating the in-platform datasets yield better performance for ERM-based methods.Notably, as the percentage of abusive speech decreases from the top row to the bottom row in Table 7, positive-F 1 scores also drop accordingly.But Fish-based methods suffer least performance deterioration (10.1% drop from fb-yt to wiki for SCL-Fish, 7.2% drop from fb-yt to wiki for Fish) than the other methods (12.3% drop from fb-yt to wiki for ERM, 12.7% drop from fb-yt to wiki for SCL-ERM).This shows that domain generalization helps the methods to learn more robust platform-invariant features, which in turn, results in more accurate detection of abusive speech on cross-platform datasets.

D Quantitative Comparison for Twitter
In-Domain and Out-Domain Datasets We compare twitter (in-domain) and twi-fb (outdomain) datasets based on linguistic features and sentiment analysis.For each dataset, we compute average sentiment scores, average number of words, and characters for both abusive and non-abusive classes.
Table 8 reflects the difference in sentiments scores and linguistic features between the datasets.We see that the number of words and the number of characters are higher for the out-domain (twi-fb) dataset than the in-domain (twitter) dataset for both abusive and non-abusive classes.Additionally, the examples of out-domain datasets have more negative sentiment on average than the examples of in-domain dataset.These types of variation can shift the distribution of the datasets, as a result, the models may struggle to perform better on an out-domain dataset (Table 2).

E Rationale for Performance Gap across Platforms
To this end, we aim to study the reason for the performance gap of the models across different platforms through a qualitative analysis of linguistic variance.We sample abusive texts from the platforms and plot the word frequency in Figure 5.We observe that the type of abusive texts varies along with the linguistic features across the platforms.For example, on social networks like Twitter, most appeared words in abusive texts are 'f*cking', 'gun', 'a*s', which mostly imply violence and personal attack.Meanwhile, an extremist forum like Stormfront contains words like 'black', 'white', 'jews' which indicate abusive comments to-wards a particular community or ethnicity.Linguistic features from a public forum like Reddit reveal that abusive comments on this platform are mostly targeted attacks and slang.Abusive conversation with AI bots mostly contains strong words in the form of personal attacks.On the other hand, user comments on broadcasting media like Fox News do not contain any strong words but rather implicit abuse focused towards a particular race like 'black', person like 'Obama', or sexual orientations like 'gay'.Finally, abusive texts on Wikipedia include both targeted and untargeted slang words toward a specific entity.
The variation of abuse across different platforms shows that training models on a specific platform are not enough to address the issue of mitigating abusive language on another platform.This also implies the importance of the platform-generalized study of abusive language detection.

F Datasets Description
In this section, we briefly describe the datasets we compile for our cross-platform experiments.
F.1 wiki wiki dataset represents Wikipedia platform.We collect this dataset from Wulczyn et al. (2017).The corpus contains 63M comments from discussions

F.2 twitter
We collect twitter dataset from a variety of sources.Waseem and Hovy (2016) annotate around 16k tweets that contain sexist/racist language.Initially, the authors bootstrap the corpus based on common slurs, then manually annotate the whole corpus to identify tweets that are offensive but do not contain any slur.Similarly, Davidson et al. (2017) crawled tweets with lexicon containing words and phrases identified by internet users as hate speech.Then crowdsourcing is performed to distinguish the category of hate, offensive, and normal tweets, resulting in around 25k annotated tweets.Jha and Mamidi (2017) crawled Twitter with the terms that generally exhibit positive sentiment but sexist in nature (e.g.'as good as a man', 'like a man', 'for a girl').The authors also annotate tweets that are aggressively sexist.The final corpus contains around 10k tweets of implicit/explicit sexist and normal tweets.ElSherief et al. ( 2018) adopt multi-step data collection process that include collecting tweets based on lexicon, hashtag, and other existing works (Waseem and Hovy, 2016;Davidson et al., 2017).Then, crowdsourcing is applied to annotate targeted and untargeted hate speech.Founta et al. (2018) build an annotated corpus of 80k tweets with seven classes (offensive, abusive, hateful speech, aggressive, cyberbullying, spam, and normal).F.5 fox fox dataset represents user comments on the broadcasting platform Fox News.We collect this dataset from Gao and Huang (2017).The authors find that the hateful comments are more implicit and creative and such hateful comments detection requires context-dependency.

F.6 twi-fb
twi-fb dataset contains user posts from Twitter and Facebook.We collect this dataset from Mandl et al. (2019).The authors initially collect the corpus by crawling keywords and hashtags.Later, they annotate the corpus into targeted/untargeted hate speech, offense, and profane.

F.10 gab
We collect gab dataset from Qian et al. (2019).Unlike other datasets, Qian et al. (2019) provide the full conversation which helps the models to understand the context.We collect 15,926 examples from the original corpus of which 15,270 are hate speech.
F.11 yt-reddit yt-reddit dataset is collected from Mollas et al. (2020).The authors develop the dataset, namely, ETHOS sampling from Youtube and Reddit comments.The authors emphasize reducing any kinds of bias (e.g.gender) in the annotation process and annotate the dataset into various forms of targeted hate speech (e.g.origin, race, disability).We sample an equal number of hate and normal speech from this dataset.

G Error Analysis
We conduct an error analysis on the examples that SCL-Fish misclassified.We randomly sample 50 misclassified examples and divide them into three categories: False-abusive: Examples that are normal but SCL-Fish categorizes them as abusive.
Offensive: Examples that are degrading, harassing to an individual or untargeted abuse, trolling but SCL-Fish categorizes them as normal.
Hate: Examples that contain targeted attacks towards a particular group or identity.
We provide examples for each category in Table 9. Figure 6 shows that SCL-Fish misclassify 32% normal examples as false-abusive.Most of the examples of this category contain some sort of slang words that the model confuses as abusive.On other hand, SCL-Fish misclassifies 28% of the offensive examples as normal.This is because the examples may contain some positive words (e.g.'please') or do not contain any profanity.Therefore, the model considers them as normal speech.Lastly, around 40% of the hate speech is misclassified as normal by SCL-Fish.Similar to the reason for offensive, the model confuse because of some sarcastic positive words and lack of expected profanity.This analysis shows that detecting implicit abusive language that does not contain direct profanity is still challenging and a direction to be explored in the future.

Figure 1 :
Figure 1: Examples of abusive language on different platforms.

Figure 2 :
Figure 2: tSNE representations of platforms.We plot the embedding of [CLS] token from pre-trained BERT.

Figure 3 :
Figure 3: Attention visualization for different platforms.Deeper color indicates higher attention.

Figure 4 :
Figure 4: tSNE plot for Fish vs. SCL-Fish on Fox News Comment, Reddit, and StormFront.Abusive samples are presented in orange and non-abusive samples are presented in blue.

Figure 5 :
Figure 5: Top-20 normalized word frequency of abusive language for different platforms (ignoring stopwords and non-alphabetic characters).
Mathur et al. (2018) annotate a corpus of around 3k tweets containing hate, abusive, and normal tweets.Basile et al. (2019) crawled 13k tweets containing abusive language against women and immigrants.The authors applied crowdsourcing to annotate if the tweets contain individual/ group hate speech or aggressiveness.Mandl et al. (2019) develop a corpus of 7k English examples with the category of hate, offensive, and profanity.Ousidhoum et al. (2019) build a corpus of multilingual and multi-aspect hate speech.The English corpus (5,647 tweets) covers a wide range of hate speech categories including the level of directness, hostility, targeted theme, and targeted group.Zampieri et al. (2019a) develop an offensive corpus of 14,100 tweets based on hierarchical modelings, such as whether a tweet is offensive/targeted, if it is targeted towards a group or individual.Our final twitter dataset contains 132.815 exam-ples of which 77,656 are abusive.F.3 fb-ytfb-yt represent both Facebook and Youtube platforms.We collect this dataset fromSalminen et al. (2018).Salminen et al. (2018) crawled the comments from Facebook and Youtube videos and annotate them into hateful, non-hateful categories.The authors also subcategorize hateful comments into 21 classes including accusation, promoting violence, and humiliation.F.4 stormfront stormfront dataset is collected from de Gibert et al. (2018).The authors crawled around 10k examples from Stormfront and categorize them into hate/normal speech.The authors further investigate whether joining subsequent seemingly normal sentences result in hate speech.Our final dataset contains 1364 hateful speech from Stormfront.

F. 7
reddit reddit dataset contains conversations from Reddit.Qian et al. (2019) compiled a list of toxic subreddit and crawled user conversations from those subreddits.Additionally, the authors provide hate speech intervention, where the goal is to automatically generate responses to intervene during online conversations that contain hate speech.The final dataset contains 2511 examples of hate/abusive speech.F.8 convAICercasCurry et al. (2021) collect convAI dataset from the user conversation with an AI assistant, CarbonBot, hosted on Facebook Messenger and a

Figure 6 :
Figure 6: Percentage of error categories in a randomly sampled 50 misclassified examples.

Table 2 :
Performance on cross-platform datasets.Bold font represents the best performance for a particular metric.Gray cells indicate performance on the datasets from identical or overlapping platforms but different sources and distributions.* Although SCL-Fish exhibits comparable accuracy with other competitive models on this imbalanced dataset, it achieves better accuracy on the balanced dataset (Appendix B).

Table 3 :
. Performance of NULI (BERT-based Performance of models after finetuning.Bold and underline represent best and second best performance for a particular metric, respectively.

Table 5
presents the configuration of hyperparameters during the finetuning (Section 6.2).

Table 6 :
Performance on the balanced cross-platform datasets.Bold font represents best performance for a particular metric.Gray cells indicates performance on the datasets from identical or overlapping platforms but different sources and distributions.

Table 7 :
Performance on in-platform datasets.Bold font represents best performance for a particular metric.

Table 9 :
Once at work, I accidentally checked out some dragons f*cking cars.Didn't know that would be an issue.Abusive Offensive The girl on the left looks like a Jewish pizza.Normal Hate The Blacks and Browns are destroying Sweden and nobody does anything about it!Normal False-abusive I've been out for years and it hasn't been easy, but I also love being a f*g.Abusive Offensive Why do you not think so? Feck off.Please go on.You annoy me.Normal Hate All we have is Blacks and a few Mestizos and all are pretty well behaved, we keep them that way around here.Sample of misclassified examples from False-abusive, Offensive, and Hate category.