CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network

The tremendous growth of social media users interacting in online conversations has led to significant growth in hate speech, affecting people from various demographics. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations. CoSyn introduces novel ways to encode these external contexts and employs a novel context interaction mechanism that clearly captures the interplay between them, making independent assessments of the amounts of information to be retrieved from these noisy contexts. Additionally, it carries out all these operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn on 6 hate speech datasets and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.


Introduction
Hate speech is defined as the act of making utterances that can potentially offend, insult, or threaten a person or a community based on their caste, religion, sexual orientation, or gender (Schmidt and Wiegand, 2019).For social media companies like Twitter, Facebook, and Reddit, appropriately dealing with hate speech has been a crucial challenge (Tiku and Newton, 2015).Hate speech can take the form of overt abuse, also known as explicit hate speech (Schmidt and Wiegand, 2017), or can be uttered in coded or indirect language, also known as Figure 1: Illustration of a social media conversation tree with implicit hate speech.User 1 posts a factual statement about practices people follow in a certain location.In response, User 2 implies hate through a sarcastic statement, to which User 3 elaborates with a positive stance.Clearly, these utterances are even difficult for humans to classify as hate or not without the proper conversational context.implicit hate speech (Jurgens et al., 2019).Fig. 1 illustrates a Twitter conversation where users convey hate implicitly through sarcasm and conversational context.Societal Problem and Impact.Though a considerable amount of research has been done on detecting explicit hate speech (Schmidt and Wiegand, 2019), detecting implicit hate speech is a greatly understudied problem in the literature, despite the social importance and relevance of the task.In the past, extremist groups have used coded language to assemble groups for acts of aggression (Gubler and Kalmoe, 2015) and domestic terrorism (Piazza, 2020) while maintaining deniability for their actions (Dénigot and Burnett, 2020).Because it lacks clear lexical signals, implicit hate utterances evade keyword-based detection systems (Wiegand et al., 2019), and even the most advanced neural architectures may not be effective in detecting such utter-ances (Caselli et al., 2020).
Current Challenges.Current state-of-the-art hate speech detection systems fail to effectively detect implicit and subtle hate (Ocampo et al., 2023).Detecting implicit hate speech is difficult for multiple reasons: (1) Linguistic nuance and diversity: Implicit hate can be conveyed through sarcasm, humor (Waseem and Hovy, 2016), euphemisms (Magu and Luo, 2018), circumlocution (Gao and Huang, 2017), and other symbolic or metaphorical languages (Qian et al., 2018).(2) Varying context: Implicit hate can be conveyed through everything from dehumanizing comparisons (Leader Maynard and Benesch, 2016), and stereotypes (Warner and Hirschberg, 2012) to threats, intimidation, and incitement to violence (Sanguinetti et al., 2018;Fortuna and Nunes, 2018).(3) Lack of sufficient linguistic signals: Unlike parent posts, which contain sufficient linguistic cues through background knowledge provided by the user, replies or comments to the parent post are mostly short and context-less reactions to the parent post.These factors make implicit hate speech difficult to detect and emphasize the need for better learning systems.
Why prior work is insufficient.(ElSherief et al., 2021) define implicit hate speech as "coded or indirect language that disparages a person or group."They also propose the first dataset, Latent Hatred, to benchmark model performance on implicit hate speech classification and show that existing stateof-the-art classifiers fail to perform well on the benchmark.Though Latent Hatred builds on an exhaustive 6-class taxonomy, it ignores implicit hate speech that is conversational-context-sensitive even though it accounts for a majority of implicit hate speech online (Modha et al., 2022;Hebert et al., 2022).Lin (2022) builds on Latent Hatred and propose one of the first systems to classify implicit hate speech leveraging world knowledge through knowledge graphs (KGs).However, beyond the fact that their system is restricted to only English due to the unavailability of such KGs in other languages, their system also fails to capture any kind of external context, which is vital for effective hate speech detection (Sheth et al., 2022).Thus, we first extend the definition of implicit hate speech to include utterances that convey hate only in the context of the conversational dialogue (example in Fig. 1).Next, we propose a novel neural learning system to solve this problem.
Main Contributions.In this paper, we propose CoSyn, a novel neural network architecture for detecting implicit hate speech that effectively incorporates external contexts like conversational and user.Our primary aim is to classify whether a target utterance (text only) implies hate or not, including the ones that signal hate implicitly.CoSyn jointly models the user's personal context (historical and social) and the conversational dialogue context in conversation trees.CoSyn has four main components: (1) To encode text utterances, we train a transformer sentence encoder and promote it to learn bias-invariant representations.This helps us to handle keyword bias, a long-standing problem in hate speech classification (Garg et al., 2022).(2) We start by modeling the user's personal historical context using a novel Hyperbolic Fourier Attention Network (HFAN).HFAN models diverse and scale-free user engagement of a user on social media by leveraging Discrete Fourier Transform (Cooley and Tukey, 1965) and hyperbolic attention on past user utterances.(3) We next model the user's personal social context using a Hyperbolic Graph Convolutional Network (HGCN) (Chami et al., 2019).HGCN models the scale-free dynamics of social networks using hyperbolic learning, leveraging the social connections between users, which act as edges in the graph.(4) Finally, to jointly model a user's personal context and the conversational dialogue context, we propose a novel Context Synergized Hyperbolic Tree-LSTM (CSHT).CSHT effectively models the scale-free nature of conversation trees and clearly captures the interaction between these context representations in the hyperbolic learning framework.We describe all our components in detail in Section 2.7.To summarize, our main contributions are as follows: • We introduce CoSyn, the first neural network architecture specifically built to detect implicit hate speech in online conversations.CoSyn leverages the strengths of existing research and introduces novel modules to explicitly take into account user and conversational context integral to detecting implicit hate speech.
• Through extensive experimentation, we show that CoSyn outperforms all our baselines quantitatively on 6 hate speech datasets with absolute improvements of 1.24% -57.8%.
• We also perform extensive ablative experiments and qualitative comparisons to prove the efficacy of CoSyn.

Methodology
Fig. 3 provides a clear pictorial representation of our proposed model architecture (algorithm shown in Algorithm 1).We provide an overview describing various operations in Hyperbolic Geometry in Appendix 2.2; and we request our readers to refer to that for more details.In this section, we describe the three constituent components of our proposed CoSyn, which model different aspects of context, namely, the author's personal historical context, the author's personal social context, and both the personal historical and social contexts with the conversational context.
Let's suppose we have N distinct conversation trees, where each tree is denoted by The primary aim of CoSyn is to classify each utterance t into its label y gt ∈ {0, 1} where 0 and 1 indicate whether the utterance is hateful or not.Additionally, each hateful utterance (y gt i = 1) is labeled with y ip ∈ {0, 1} where 0 and 1 indicate whether the hateful utterance is explicitly or implicitly hateful.y ip is used only for analysis.In the following subsections, "user" refers to the author of an utterance in a conversation tree to be assessed.

Hyperbolic Geometry: Background
Hyperbolic Geometry.A Riemannian manifold is a D-dimensional real and smooth manifold defined with an inner product on tangent space where the tangent space T x M is a D-dimensional vector space representing the first-order local approximation of the manifold around a given point x.A Poincaré ball manifold is a hyperbolic space defined as a constant negative curvature Riemannian manifold, denoted by H D , g where manifold H D = x ∈ R D : ∥x∥< 1 and the Riemannian metric is given by g x = λ 2 x g E where g E = I D denotes the identity matrix, i.e., the Euclidean metric tensor and λ x = 1 1−∥∥x∥∥ 2 .To perform operations in the hyperbolic space, the exponential map exp x : T x H D → H D and the logarithmic map log x : H D → T x H D are used to project Euclidean vectors to the hyperbolic space and vice versa respectively.
where x, y ∈ H D , v ∈ T x H D .Further, to perform operations in the Hyperbolic space, the following basic hyperbolic operations are described below: Mo bius Addition ⊕ adds a pair of points x, y ∈ H D as, Möbius Multiplication ⊗ multiplies vectors x ∈ H D and W ∈ R D ′ ×D given by, Möbius Element-wise Multiplication ⊙ performs element-wise multiplication on x, y ∈ H D , Hyperbolic Distance between points x, y ∈ H D is given by: Fourier Transform.The Fourier transform breaks down a signal into its individual frequency components.When applied to a sequence x n with n ∈ [0, N − 1], the Discrete Fourier Transform (DFT) generates a new representation X k for each value of k, where 0 ≤ k ≤ N − 1. DFT accomplishes this by computing a sum of the original input tokens x n , multiplied by twiddle factors [e −2πikn/N ], where n is the index of each token in the sequence.Thus, DFT is expressed as: x n e −2πikn/N , 0 ≤ k ≤ N − 1 (7)

Bias-invariant Encoder Training
CoSyn, like other works in literature, builds on the assumption that linguistic signals from text utterances can effectively enable the detection of hate speech in online conversations.Thus, our primary aim is to learn an encoder e(. ) that can effectively generate vector representations R d for a text utterance where d is the dimension of the vector.Specifically, we fine-tune a SentenceBERT (Reimers and Gurevych, 2019) and solve an additional loss proposed in (Mathew et al., 2021) using self-attention maps and ground-truth hate spans.Keyword bias is a long-standing problem in hate speech classification, and solving the extra objective promotes learning bias-invariant sentence representations.

Modeling Personal User Context
Modeling personal user context, in the form of author profiling, for hate speech classification has shown great success in the past because hateful users (users prone to making hate utterances) share common stereotypes and form communities around them.They exhibit strong degrees of homophily and have high reciprocity values (Mathew et al., 2019).We hypothesize that this will prove to be especially useful in classifying conversational implicit hate speech, which on its own lacks clear lexical signals or any form of background knowledge.Thus, our primary aim is to learn a vector representation U u ∈ R d for the user u who has authored the utterance t to be assessed, where d is the dimension of the vector.For our work, we also explore the importance of the personal historical context of a user to enable better author profiling.
Studies show that past social engagement and linguistic styles of their utterances on social media platforms play an important role in assessing the user's ideology and emotions (Xiao et al., 2020).Thus we propose a novel methodology for author profiling that is more intuitive, explainable, and effective for our task.Our primary aim is to model the user's (author of the utterance to be assessed) personal context and generate user representations U u , which can then be used for hate speech classification.To achieve this, we first encode a user's historical utterances U hist u using our Hyperbolic  Fourier Attention Network (HFAN) followed by modeling the user's social context by passing U hist u through a Hyperbolic Graph Convolution Network (HGCN).To be precise, learning personal user context takes the form of e(HistoricalU tterances) We next describe, in detail, HFAN and HGCN.

Hyperbolic Fourier Attention Network
User engagement on social media is often diverse and possesses scale-free properties (Sigerson and Cheng, 2018).Thus to account for the natural irregularities and effectively model a user's personal historical context, we propose a novel Hyperbolic Fourier Attention Network (HFAN).The first step is to encode historical utterances, H u , using our encoder e(.), made by the user u, denoted by Next, we apply 2D DFT to the embeddings (1D DFT along the temporal dimension and 1D DFT along the embedding dimension) to capture the most commonly occurring frequencies (ideologies, opinions, emotions, etc.) in the historical utterances made by the user using: where F is the DFT operation.The 2D DFT operation helps highlight the most prominent frequencies, which signifies a holistic understanding of the user's sociological behavior.Next, we hypothesize that the frequencies most relevant to a conversation can act as the most important clues to assessing how a user would react to other utterances in the conversation tree (for e.g., a user's stance on the contribution of a particular political party to a recent riot in discussion).Additionally, these frequencies may change over time.Thus, to account for the latter factor first, we pass the embeddings through a Hyperbolic GRU (Zhang and Gao, 2021), which first projects these embeddings from the Euclidean to the hyperbolic space using a Poincaré ball manifold and then effectively captures the sequential temporal context on time-aligned historic user utterances.The Hyperbolic GRU is a modified version of the conventional GRU, which performs Mobius operations on the Poincaré model (addition, multiplication, and bias).We request our readers to refer to (Zhang and Gao, 2021) for more details.Next, to account for the former factor, we perform Hyperbolic Attention (Zhang and Gao, 2021), which first linearly projects the utterance embeddings within Poincaré space and then constructs the final user embedding U hist u via Einstein midpoint: where h sL and h sK are the utterance encoding of utterance h s obtained from U f ourier u projected from the Poincaré space into the Lorentz space and Klein space for computational convenience (Zhang and Gao, 2021).c sL is the sentence centroid, which is randomly initialized and learnable.

Hyperbolic Graph Convolutional Network
After obtaining the user representations U hist u for every user u in the dataset, we model social relationships between the users as a graph G = (V, E).Each edge e = {u i , u j } ∈ E represents one of the four types of relations: 1) User u i retweets a tweet posted by u j , 2) User u i mentions user u j in a tweet t t , 3) User u i replies to a post by user u j or 4) User u i follows user u j .Each vertex v ∈ V represents the user representations U u .HGCN (Chami et al., 2019) modifies the conventional GCN and performs neighbor aggregation using graph convolutions in the hyperbolic space to enrich a user's historical context representations learned through HFAN using social context.Social network connections between users on a platform often possess hierarchical and scale-free structural properties (degree distribution of nodes follows the power law as seen in Fig 2 and decreases exponentially with a few nodes having a large number of connections (Barabási and Bonabeau, 2003)).To model such complex hierarchical representations, HGCN performs all operations in the Poincaré space.We first project our representations U onto the hyperbolic space using a Poincaré ball manifold (exp K (.)) with a sectional curvature -1/K.Formally, the feature aggregation based at the i th HGCN layer is denoted by: where -1/K i−1 and -1/K i are the hyperbolic curvatures at layers i -1 and i, respectively.Ã = D −1/2 AD −1/2 is the degree normalized adjacency matrix, W is a trainable network parameter , O i is the output of the i th layer and FM is the Frechet Mean operation.σ ⊗ K i−1 ,K i is the hyperbolic nonlinear activation allowing varying curvature at each layer.
CSHT presents several modifications and improvements over Tree-LSTM (Tai et al., 2015), including (1) incorporating both the user's personal context and the conversation context while clearly capturing the interactions between them, and (2) operating in the hyperbolic space, unlike the original TreeLSTM, which operates in the Euclidean space.Conversation trees on social media possess a hierarchical structure of message propagation, where certain nodes may have many replies; e.g., nodes that include utterances from popular users.Such phenomena lead to the formation of hubs within the conversation tree, which indicates scale-free and asymmetrical properties of the conversation tree (Avin et al., 2018).The conversation tree is T represented using T = (V, E), where vertex v ∈ V represents the encoded utterance X j = e(t j ) (t is part of the conversation tree T ) and edge e ∈ E represents one of the three relations between the utterances: (1) parent-comment, (2) comment-reply or (3) replyreply.CSHT is modeled as a Bi-Directional Tree- U u is the user representation of the user u who authored the utterance to be assessed obtained from the HGCN.
In order to understand the signals in CHST, we focus on a specific node t j .We gather information from all input nodes t k where {t k , t j } ∈ E, and combine their hidden states h k using Einstein's midpoint to produce an aggregated hidden state h j for node t j .We use the hyperbolic representation of the current post, denoted as X j , as well as the user embeddings of the author of the post, denoted as U u , to define computational gates operating in the hyperbolic space for CSHT cells.As defined in Subsection2.2⊙, ⊕, ⊗ represent Möbius Dot Product, Möbius Addition and Möbius matrix multiplication, respectively.
Considering multiple input nodes t k , {t k , t j } ∈ E, we implement multiple hyperbolic forget gates incorporating the outputs h k of the input nodes with the current node's combined user and post representation r j .
(12) The input gate i j and the intermediate memory gate u j corresponding to the representation for the current utterance are given by: hj ⊕ b (i)  (13) The input gate m j and the intermediate memory gate s j corresponding to the user representation from the social graph are given by: The output gate for the tree cell is then calculated as follows: The parameters W (w) , U (w) , b (w) are learnable parameters in the respective gate w.We obtain the cell state for the current cell c j by combining the representations from the multiple forget gates f jk , k∀t k , s.t.{t k , t j } ∈ E, as well as gates corresponding to the tweet and user representations as follows: These equations are applied recursively by blending in tree representations at every level.The output of the current cell, h j is given by: CHST(X j , U j ) = h j = o j ⊙ exp o (tanh (c j )) (19) Final Prediction Layer.We concatenate the node output from CSHT h j , projected to the Euclidean space, with the Euclidean utterance features for the given tweet, X j to form a robust representation incorporating features of the tweet along with social and conversational context derived from the CSHT network of CoSyn.This concatenated representation is passed through a final classification layer to obtain the final prediction y gt j = Softmax (MLP ([log o (h j ) ; X j ])).For training, we optimize a binary cross-entropy loss.

Dataset
We evaluate CoSyn on 6 conversational hate speech datasets; namely, Reddit (Qian et al., 2019), GAB (Qian et al., 2019), DIALOCONAN (Bonaldi et al., 2022), CAD (Vidgen et al., 2021), ICHCL (Modha et al., 2021) andLatent Hatred (ElSherief et al., 2021).Appendix A provides dataset statistics.Table 1 reports the micro-F 1 scores averaged across 3 runs with 3 different random seeds.Implicit Hate Annotations.The original datasets do not have annotations to indicate if an utterance denotes implicit or explicit hate speech.Thus, for our work, we add extra annotations for evaluating how CoSyn fares against our baselines in understanding the context for detecting implicit hate speech.Specifically, we use Amazon Mechanical Turk (MTurk) and ask three individual MTurk workers to annotate if a given utterance conveys hate implicitly or explicitly.The primary factor was the requirement of conversational context to understand the conveyed hate.At stage (1), we provide them with the definition of hate speech and examples of explicit and implicit hate speech.At stage (2), we provide complete conversations and ask them to annotate implicit or explicit in a binary fashion.Cohen's Kappa Scores for inter-annotator agreement are provided with the dataset details in Appendix A.

Baselines
Due to the acute lack of prior work in this space, beyond just hate speech classification, we also compare our method with systems proposed in the thriving fake news detection literature, which considers conversational and user contexts to detect fake news.Some of these systems had to be adapted to our task, and we describe about the working of each baseline in detail in Appendix B. Specifically, we compare CoSyn with Sentence-BERT (Reimers and Gurevych, 2019), ConceptNet, HASOC (Farooqi et al., 2021), Conc-Perspective (Pavlopoulos et al., 2020), CSI (Ruchansky et al., 2017), GCAN (Lu and Li, 2020), HYPHEN (Grover et al., 2022), FinerFact (Jin et al., 2022), GraphNLI (Agarwal et al., 2022), DUCK (Tian et al., 2022), MRIL (Sawhney et al., 2022a) and Madhu (Madhu et al., 2023).

Hyperparameters
We decide on the best hyper-parameters for CoSyn based on the best validation set performance using grid search.The optimal hyperparameters are found to be, batch-size b = 32, HGCN output embedding dimension g = 512, hidden dimension of CHST h = 768, latent dimension of HFAN l = 100, learning rate lr = 1.3e −2 , weight decay β = 3.2e −4 , and dropout rate δ = 0.41.

Results and Ablations
Table 1 compares the performance of CoSyn with all our baselines on 6 hate speech datasets.As we see, Cosyn outperforms all our baselines both on the entire dataset and on the implicit subset, thus showing its effectiveness for implicit hate speech detection.Despite the ambiguous nature of implicit hate speech, CoSyn has high precision scores for implicit hate speech, thus implying that it mitigates the problem of false positives better than other baselines.One common observation is that our bias invariant SentenceBERT approach emerges as the most competitive baseline to CoSyn, thereby reinforcing that most prior work do not effectively leverage external context.
Table 2 ablates the performance of CoSyn, removing one component at a time to show the significance of each.All results have been averaged        1: Result comparison of CoSyn with our baselines on 6 hate speech datasets.We compare performance on both the overall dataset and the implicit subset.C.F1 and R.F1 indicate F1 scores measured on only comments and replies, respectively.CoSyn outperforms all our baselines with absolute improvements in the range of 3.4% -45.0% when evaluated on the entire dataset and 1.2% -57.9% when evaluated on only the implicit subset.-indicates conversation trees in the dataset did not have replies.across all 6 datasets.Some major observations include (1) CoSyn sees a steep drop in performance when used without user context (we model only the conversation context with a vanilla Tree-LSTM in the hyperbolic space).This proves the effectiveness of the user's personal context.Additionally, modeling user context with HFAN and HGCN proves to be more effective than just feeding meanpooled historical utterance embeddings into CSHT.
(2) Modeling in the Euclidean space results in an overall F1 drop of 3.8%.We reinforce the effectiveness of modeling in the hyperbolic space through our findings in Fig. 2.

Results Analysis
In Fig. 4, we show test set predictions from 4 different conversation trees from the ICHCL dataset with an attempt to study the effect of conversational and author context for target utterance classification.We notice that hateful users possess high degrees of homophily and are predominantly hateful over time.It also reveals a limitation of CoSyn where insufficient context from the parent leads to a false positive (comment in the 4th example).

Related Work
Prior research on identifying hate speech on social media has proposed systems that can flag online hate speech with remarkable accuracy (Schmidt and Wiegand, 2019).However, as mentioned earlier, prior efforts have focused primarily on classifying overt abuse or explicit hate speech (Schmidt and Wiegand, 2019) with little or no work on classifying implicit hate speech or that conveyed in coded language.The lack of research on this topic can also be attributed to existing datasets being skewed towards explicitly abusive text (Waseem and Hovy, 2016).Recently, this area of research has seen growing interest, with several datasets and benchmarks released for evaluating the performance of existing hate speech classifiers in identifying implicit hate speech (Caselli et al., 2020;ElSherief et al., 2021;Sap et al., 2019).One common observation is that most prior systems from literature fail to identify implicit hate utterances (ElSherief et al., 2021).Lin et al. (Lin, 2022) proposes one of the first systems to classify implicit hate speech leveraging world knowledge.However, they evaluate their performance on only Latent Hatred, which lacks conversational context.Additionally, acquiring world knowledge through knowledge graphs (KGs) requires language-specific KGs, and short utterances in conversation trees make the retrieval weak.In the past, to classify context-sensitive hate speech in existing open-source datasets, researchers incorporated conversational context for hate speech classification (Gao and Huang, 2017;Pérez et al., 2022).However, these systems employ naive fusion and fail to leverage structural dependencies between utterances in the conversation tree.Another line of work explores author profiling using community-based social context (connections a user has with other users) (Pavlopoulos et al., 2017;Mishra et al., 2018).However, the representations are not learned end-to-end and employ naive fusion.
Hyperbolic networks have been explored earlier for tasks that include modeling user interactions in social media, like suicide ideation detection (Sawhney et al., 2021b(Sawhney et al., , 2022b)), fake news detection (Grover et al., 2022), online time stream modeling (Sawhney et al., 2021a), etc.All these works show that modeling the hierarchical and scale-free nature of social networks and data generated online benefits from modeling in the hyperbolic space over Euclidean space.

Conclusion
In this paper, we present CoSyn, a novel learning framework to detect implicit hate speech in online conversations.CoSyn jointly models the conversational context and the author's historical and social context in the hyperbolic space to classify whether a target utterance is hateful.Leveraging these contexts allows CoSyn to effectively detect implicit hate speech ubiquitous in online social media. of all its components.Therefore, as a direction for future research, our focus will be on enhancing the performance of individual components.

Ethics Statement
Our institution's Institutional Review Board (IRB) has granted approval for this study.In the annotation process, we took precautions by including a cautionary note in the instructions, alerting annotators to potentially offensive or distressing content.Annotators were also allowed to discontinue the labeling process if they felt overwhelmed.
Additionally, in light of the rapid proliferation of offensive language on the internet, numerous A0based frameworks have emerged for hate speech detection.Nevertheless, a significant drawback of many current hate speech detection models is their narrow focus on explicit or overt hate speech, thereby overlooking the identification of implicit expressions of hate that hold equal potential for harm.CoSyn could ideally identifying implicit hate speech with remarkable accuracy, preventing targeted communities from experiencing increased harm online.each conversation consists of 2.86 posts, and the average length of each post is 35.6 tokens.14,614 posts are labeled hate speech, and 19,162 are labeled non-hate speech.Nearly all the conversations, 11,169 (94.5%), contain hate speech.31,487 intervention responses were originally collected for conversations with hate speech, or 2.82 responses per conversation on average.The average length of the intervention responses is 17.27 tokens.User history was fetched using the GAB API 3 .The metadata for each fetched post provides information like the author and timestamp for a post.Dataset statistics can be found in Table 6.The Cohen's Kappa Scores for inter-annotator agreement for 3 pairs of annotators annotating for implicit hate speech were 0.85, 0.82, and 0.91.
DIALOCONAN.The DIALOCONAN dataset (Bonaldi et al., 2022) has more than 3K dialogical interactions between two interlocutors, one acting as the hater and the other as the NGO operator, for a total of more than 16K turns.The previous tweets of a particular user were considered to be the history for that.Dataset statistics can be found in Table 7.The Cohen's Kappa Scores for inter-annotator agreement for 3 pairs of annotators annotating for implicit hate speech were 0.79, 0.78, and 0.79.
CAD.The CAD dataset (Vidgen et al., 2021) is an annotated dataset of ≈ 25,000 Reddit entries.The dataset is labeled with 6 conceptually different categories, including Identity-directed, Person-directed, Affiliation-directed, Counter Speech, Non-hateful Slurs, and Neutral.The dataset is also annotated with salient subcategories, such as whether personal abuse is directed at a person in the conversation thread or someone outside it.This taxonomy offers greater coverage and granularity of abuse than previous work.Each entry can be assigned to multiple primary and/or secondary categories.The dataset is also annotated in context, where each entry is annotated in the context of the conversational thread it is part of.Every annotation also has a label for whether contextual information was needed to make the annotation.User history, the timestamp for each post, and the username for each post were fetched using the Reddit API 4 .Dataset statistics can be found in Table 5.The Cohen's Kappa Scores for inter-annotator agreement for 3 pairs of annotators annotating for implicit hate speech were 3 https://www.npmjs.com/package/gab-api 4 https://www.reddit.com/dev/api/Algorithm 1 CoSyn: Implicit Hatespeech Detection Given: N distinct conversation trees T indexed by Tn L distinct users where each user is indexed by u l historical utterances of user u, H u each tree Tn = {t Tn 0 , • • • , t ICHCL.The ICHCL dataset (Modha et al., 2021) (Identification of Conversational HateSpeech in Code-Mixed Languages) consists of hind-english code-switched Twitter conversations.The primary task is to identify comments and replies that can be considered acceptable when considered alone but may appear hateful, profane, or offensive when the context of a parent tweet is considered.Binary classification of such contextual posts was considered in this subtask.Around 7,000 code-mixed postings in English and Hindi were downloaded from Twitter and annotated in-house by the authors.User history was fetched using the Twitter API5 .Dataset statistics can be found in Table 8.The Cohen's Kappa Scores for inter-annotator agreement for 3 pairs of annotators annotating for implicit hate speech were 0.72, 0.81, and 0.74.
Latent Hatred.The Latent Hatred dataset (ElSherief et al., 2021) is a hate speech dataset of 27K Gab messages annotated according to a 6-class taxonomy that includes White Grievance, Incitement to Violence, Inferiority Language, Irony, Stereotypes and Misinformation, and Threatening and Intimidation.Comments for every post, the user's user history, and the timestamp for each post were fetched using the Twitter API6 .Dataset statistics can be found in Table 4.The Cohen's Kappa Scores for inter-annotator agreement for 3 pairs of annotators annotating for implicit hate speech were 0.91, 0.96, and 0.89.

B Baseline Descriptions
Sentence-BERT w/ Classification Head We use our bias-invariant Utterance Encoder trained and The detection system may inadvertently amplify or reinforce existing societal biases if these biases are not adequately addressed.For example, if the training data is biased towards specific demographics or ideologies, the model might exhibit unfair treatment or misclassification of certain groups, leading to potential discrimination or harm.Understanding the contextual nuances of language is a complex task.CoSyn might also be prone to over-generalization, potentially resulting in false positives or misclassification of non-hateful speech.The risk is that legitimate expressions of opinion or controversial yet non-hateful statements may be mistakenly flagged as hate speech.This could inadvertently lead to censorship or suppression of freedom of speech, limiting open dialogue and critical discussions on important social issues.

Figure 2 :
Figure 2: Various properties of the social graph and conversation trees averaged across datasets.The fitting of the node distribution in the power law (γ ∈ [2, 3]) (Choromański et al., 2013) and low hyperbolicity (Barabási and Bonabeau, 2003) δ indicates the scalefree nature of conversations and social graphs.

Figure 3 :
Figure 3: Illustration of CoSyn.CoSyn detects if a target utterance, authored by a social media user, implies hate or not leveraging three main components: (1) HFAN: This models the user's personal historical context by employing Fourier Transform and Hyperbolic Attention on the user's historical tweets.(2) HGCN: This models the user's social context through hyperbolic graph learning on its relations with other users.(3) CSHT: This jointly models the user's personal context and the conversational context to finally classify if the utterance is hateful.

Figure 4 :
Figure 4: We study 4 conversation trees in the ICHCL dataset, including the prediction of different classifiers on the utterance to be assessed, the historical engagement of the author of the utterance, and the social relations between the different authors.