HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

With the proliferation of social media, accurate detection of hate speech has become critical to ensure safety online. To combat nuanced forms of hate speech, it is important to identify and thoroughly explain hate speech to help users understand its harmful effects. Recent benchmarks have attempted to tackle this issue by training generative models on free-text annotations of implications in hateful text. However, we find significant reasoning gaps in the existing annotations schemes, which may hinder the supervision of detection models. In this paper, we introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill these gaps in explanations of hate speech, thus enabling effective supervision of detection models. Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines, using existing free-text human annotations. Analysis demonstrates that our method enhances the explanation quality of trained models and improves generalization to unseen datasets. Our code is available at https://github.com/joonkeekim/hare-hate-speech.git.


Introduction
The increase in the use of online media has intensified the exposure to hate speech, prompting the need for effective detection systems (Schmidt and Wiegand, 2017;Fortuna and Nunes, 2018).While early works have been limited to the classification of explicit hate speech (Caselli et al., 2020;Mathew et al., 2021), recent works have drawn our attention to implicit forms of hate speech which are more prevalent, yet subtle.(Jurgens et al., 2019).
To tackle these nuanced forms of hate speech, it is important for systems to not only identify hate speech but also provide interpretable explanations * equal contribution † corresponding authors Figure 1: HARE uses large language models (LLMs) to generate hate speech explanations step-by-step.(a) Recent benchmarks on understanding hate speech provide free-text annotations on the implications of hate speech, but gaps in reasoning hinder the supervision of generative detection models.(b) We propose the use of LLMs to fill in the gaps and enable detection models to understand and explain hate speech.(Liu et al., 2019).This can help mitigate distributional biases inherent in simple classification, allowing people to understand and reason about the potential harms of hateful text (Sap et al., 2019b).Explanations can also improve the transparency of content moderation on social media (Gillespie, 2018).
Recent works on hate speech understanding (Sap et al., 2019b;ElSherief et al., 2021;Huang et al., 2022) have considered training autoregressive language models to generate underlying explanations on hate speech.The models are trained on humanwritten free-text rationales such as implied statements and targeted groups.However, despite the use of novel benchmark datasets, i.e., SBIC (Sap et al., 2019b) and Implicit Hate (ElSherief et al., 2021), the trained models struggle to generate detailed and comprehensive explanations.Moreover, we observe that the provided rationales give marginal improvement to detection performance under joint training.
A potential cause of the limited supervision provided by existing annotations on understanding and explaining hate speech may be the existence of critical gaps in reasoning.For example, as shown in Figure 1, the implied statement of the post "How dark is my humour?It picks cotton" is annotated as "black folks are slaves", in SBIC.To understand this implication, one must understand that "dark" implies "black folks", and the phrase "picks cotton" relates to the historical background of African Americans.While this may be obvious to human annotators, language models are known to lack societal knowledge and commonsense reasoning skills to understand these nuances (Talmor et al., 2019;Li et al., 2022;Choi et al., 2023).This leaves a significant gap between the training objectives of classification and generating annotated implications, which may harm supervision (Wiegreffe et al., 2021b;Wang et al., 2023a).
Drawing inspiration from the reasoning capabilities of large language models (LLMs) improved with chain-of-thought (CoT) reasoning (Wei et al., 2022), we present our novel approach "Explainable HAte Speech Detection with Step-by-Step REasoning (HARE)".We leverage LLM-generated free-text rationales using CoT prompts to fill in the gaps of reasoning in existing hate speech annotations and enhance supervision of generative detection models.To create these rationales, we propose two approaches: (1) adopt CoT prompts to create comprehensive rationales that align with the given texts and (2) incorporate existing human annotations from benchmarks in the CoT prompts to bridge the logical gap between the input text and human annotations.When tested on the challenging SBIC and Implicit Hate datasets, our approach outperforms standard fine-tuning with given human annotations and provides enhanced explanations behind the detection results.

Preliminaries
The task of hate speech detection can be framed as a generative task that inputs the text P and outputs a prediction class C, formulated as p(C|P ), indicating whether the speech is classified as "hate" or "not hate".1 Furthermore, by incorporating humanwritten rationales of the target groups T and implied statements I of hate speech from the SBIC and Implicit Hate datasets, the task can also be formulated to sequentially output C, T , and then I, as p([C; T ; I]|P ).Hence, the model trained with human-written annotations is designed to generate C and then provide explanations using annotations T and I.However, due to a logical gap between the speech P and the annotations T and I, training a model with these annotated rationales does not significantly enhance the model's ability to comprehend hate speech.
Chain-of-thought prompting (Wei et al., 2022) refers to a prompting method that generates a chain of reasoning to derive answers.Kojima et al. (2022) introduce a method of including the phrase "Let's think step by step" at the end of the prompt to generate reasoning in a zero-shot setting.

Fr-HARE & Co-HARE
In this paper, we propose HARE, a framework that fine-tunes using rationales generated by LLMs through CoT prompts.By leveraging these rationales, the model is trained to discern whether a text is classified as hate speech or not.This approach of utilizing the LLM-generated rationales not only improve the model performance as in previous works (Ho et al., 2022;Wang et al., 2023a) but also enables the model to generate more plausible explanations.Subsequently, we introduce two variants that employ different methods for extracting rationales from LLMs.

Fr-HARE Our first strategy,
Free-HARE (Fr-HARE), extracts rationales using LLM from the LLM without referencing the annotated rationales.In line with Kojima et al. (2022), we employ zero-shot prompts to prompt the LLM to extract rationales, as follows: Determine whether the following post is offensive, and explain why.Post: P Answer: Let's explain step by step.
From the given prompt above, both the rationale R without any logical leap and the classification result C are generated.Then, we utilize the generated rationale R by training the model to sequentially generate C and R, represented as p([C; R]|P ).Utilizing these rationales enhances the model's ability to comprehend the context of hate speech, thus leading to explanations that are more intelligible to humans.

Co-HARE
We also propose a second strategy, Conditioned-HARE (Co-HARE), which utilizes human-written rationales to formulate CoT prompts.Fr-HARE generates complete rationales entirely from scratch, disregarding human annotations included in the datasets.Although Fr-HARE offers the advantage of being universally applicable to any hate speech dataset, the LLM-generated rationales might lack consistency due to wide variation in reasoning sequences.To address this inconsistency, Co-HARE integrates human-written rationales about target groups T and implied statements I into the CoT prompt as follows: Determine whether the following post is offensive, and explain why.Provide a brief, step-by-step explanation of how the post targets the specified group and how it leads to the implied statement provided.Post: P Target: T Implied statement: I Answer: Let's explain step by step.
Then, we train the model using extracted rationales, as in Fr-HARE.For samples labeled as "not hate" that do not include human-written rationales, we apply the prompt used in Fr-HARE.While Co-HARE requires human-written rationales, it generates rationales that are more tailored to the specific requirements and features of the dataset, due to its guided nature.Therefore, the model trained with Co-HARE can provide explanations that align more closely with the forms of rationales that humans construct.
Details of HARE Once we have extracted the rationales from the LLMs, we follow the approach of Kojima et al. (2022) to have the LLMs predict the class.Specifically, we employ a two-stage extraction process.In the first stage, we extract both the class C and the rationale R from the LLMs using our HARE method, represented as p([C; R]|P ), as previously outlined.In the second stage, we prompt the LLMs again, this time to predict the class C given the extracted rationales R and the post P , denoted as p(C|R, P ).During fine-tuning on hate speech datasets, if the predicted class C coincides with the true answer C, we concatenate C with the extracted rationale R. If the predicted labels are incorrect, the models are solely trained to predict the class C. Furthermore, following the findings of Ho et al. (2022), we generate multiple distinct rationales to facilitate the learning process.

Experimental Setup
We utilize SBIC and Implicit Hate datasets for our fine-tuning experiments.Our models are trained to classify the offensiveness and hatefulness of posts, using SBIC and Implicit Hate, respectively.It is noteworthy that in our Implicit Hate experiments, we combine both the explicit and implicit hate classes into a single "hate" category.We set up baselines with two families of models: C, a model trained exclusively for classification, and C+T +I, a model trained using human-written rationales.For Fr-HARE and Co-HARE, by using gpt-3.5-turbo-0613that is known for its reasoning capabilities (Ouyang et al., 2022), we extract four and eight different rationales per each sample in SBIC and Implicit Hate, respectively, following the hyperparameter setting of Ho et al. (2022).Subsequently, we fine-tune the model, setting LLM-generated rationales R and class C as target sequence.For performance evaluation, we measure detection accuracy and compute the F1 score of classification, regarding "hate" as the positive class.We make use of Flan-T5 (Wei et al., 2021) with different model configurations: small, base and large.We also conduct experiments using the large models of T5 (Raffel et al., 2020) and GPT-2 (Radford et al., 2019).A more detailed explanation of our experimental setup can be found in Appendix B.

Results and Discussions
Do LLM-generated rationales improve detection performance?Table 1 presents the performance of hate speech detection according to different methods on the SBIC and Implicit Hate datasets.Our strategies Fr-HARE and Co-HARE consistently exhibit superior performance over other baseline methods, regardless of the model size.This suggests that even though the baseline method is trained using human-written rationales, the more detailed and logically-sequenced LLM-generated rationales of HARE can further aid the model in understanding the input text and accurately classifying it as hate speech.Therefore, the results demonstrate that the quality of rationales has a strong impact on classification.Furthermore, the performance of our method consistently improves as the model size increases, in contrast to baselines.This suggests that diverse reasoning becomes increasingly beneficial as scale grows.This notable im- provement with HARE is achieved by using only 40$ for each method in our approach, demonstrating that the ability to reason can be effectively trained with rationales from LLMs.
Additionally, while Fr-HARE and Co-HARE exhibit similar performance, Co-HARE has a slight edge in most cases.This is because Co-HARE is guided by human-written annotations, which results in better alignment with the setting of the datasets, as we mentioned in Section 2.2.It is also noteworthy that all the fine-tuned models surpass both Zero-Shot (ZS) and Zero-Shot CoT (ZS-CoT, Kojima et al. ( 2022)) classification performance of GPT-3.5-turbo,indicating that merely employing LLM with CoT prompts is not sufficient to tackle this task.
Are HARE models more generalizable?To assess the ability of our methods to generalize across different datasets, we evaluate the models finetuned on the SBIC datasets using each method on two distinct datasets, HateXplain (Mathew et al., 2021) and DynaHate (Vidgen et al., 2020).Both datasets encompass forms of explicit and implicit hate.On both datasets, our methods Fr-HARE and Co-HARE both outperform baseline methods, indicating that our methods enhance the generalizability of the models by improving their reasoning  (Mathew et al., 2021) and DynaHate (Vidgen et al., 2020).We utilize Flan-T5-large fine-tuned on SBIC using each method.

Method
HateXplain   2023).The bar and line represent the average scores that range from 1 to 10 and 95% confidence interval, respectively.We utilize Flan-T5-large finetuned on SBIC using each method.

GPT-4 Scores
ability.Moreover, the comparable performance of Fr-HARE and Co-HARE suggests that both techniques of bridging logical gaps are effective in tackling hate speech detection.
Does HARE improve the quality of generated explanations?To evaluate the quality of the generated explanations, we conduct automated evaluation using GPT-4 (OpenAI, 2023) which has demonstrated its high alignment with human evaluation (Liu et al., 2023;Zheng et al., 2023)

C+T+I
The answer is : (B) Not offensive

Fr-HARE
The answer is : (A) Offensive 1.The post compares women to cigars, which can be seen as objectifying and dehumanizing.2. It also makes a reference to a specific age group, which can be seen as inappropriate and offensive.3. (…)

Co-HARE
The answer is : (A) Offensive 1.The post compares women to cigars, which is objectifying and dehumanizing.(…) 2. The post specifically mentions that the cigars should be "7 years old," which is highly inappropriate and offensive as it promotes child exploitation and sexual abuse.(…) 3. The post also mentions that the cigars should be "coming from Cuba in a burlap sack," which is a clear reference to the sex trafficking scandal that occurred in the United States in the past.(…) Post "I think a lot, but I don't say much."~ Anne Frank But lately she's been proven to have not said anything for almost a century

Label (A) Offensive
Annotation jewish people were rounded up and killed in the holocaust / jewish people were victims of a genocide.

C+T+I
The answer is : (B) Not offensive

Fr-HARE
The answer is : (A) Offensive 1.The post starts with a quote from Anne Frank, a Jewish girl who died during the Holocaust.2. The quote itself is a statement about her lack of speaking much.As Co-HARE is designed to create rationales that align better with human-written rationales, we also conduct pairwise comparison between Co-HARE and C+T +I to determine which method aligns better with human-written labels.Figure 2 (b) clearly shows that the rationales generated by the model trained using Co-HARE are aligned more to actual human-written rationales than the ones generated by the model trained directly with humanwritten rationales.This also highlights the value of Co-HARE, as it aligns better with natural intuition, indicating that both Fr-HARE and Co-HARE can be utilized for different purposes.
How does HARE detect and contextualize hate speech? Figure 3 presents model outputs using HARE and baseline method.A more detailed qualitative study of randomly selected samples is available in Appendix D. In the first sample, the post makes light of harming a young girl using phrases like "like I like cigar" and "burlap sack".While our model, Fr-HARE, does not explicitly connect the "burlap sack" to the idea of a kidnapped girl, it does recognize the harmful implications towards the girl.Furthermore, Co-HARE understands the historical context behind the term "burlap sack".The presence of the positive term "like" and the ab-

Post
Why do so many white people own pets?Because they can't own people anymore.

Label (A) Offensive
Annotation black people are inferior / black people should be slaves

LLM rationale
1.The post makes a generalization about white people and their ownership of pets.2. It suggests that white people historically owned people, implying slavery.3. It implies that white people now own pets as a replacement for owning people.This post can be considered offensive because it perpetuates a harmful stereotype about white people and makes light of a dark and painful history of slavery.It reduces the complex issue of slavery to a simplistic and offensive joke.While humor can be subjective, this post crosses the line by making inappropriate and insensitive comparisons.sence of overtly derogatory words might lead some models to classify the statement as non-offensive.However, our approach appropriately identifies the underlying hateful context.
In the second sample, the post contains hate speech targeting Jewish victims of the Holocaust by referencing Anne Frank.Our model accurately recognizes the historical background of Anne Frank as a Holocaust victim.While the baseline seems to overlook the historical significance associated with Anne Frank, our method correctly identifies her and assumes that the reference constitutes harassment against a Jewish victim, even though there is a slight misunderstanding about the context of "lack of speaking" in Fr-HARE.
Is GPT-3.5 a qualified teacher?Since our framework is based on distillation of generated rationales from GPT-3.5 to smaller models, it is crucial to verify whether the teacher is qualified.Figure 4 displays rationales produced by GPT-3.5-turbo, which is employed to train the student model.This example illustrates that the LLM not only discerns the hateful nuances towards both white and black individuals, but also offers more detailed explanations compared to rationales written by humans.Notably, it accurately correlates the historical context, associating the word "slaves" with "pets".More analysis of rationales from GPT-3.5-turbo can be found in Appendix D.2.

Conclusion
In this paper, we present HARE framework to improve the ability of the language model to understand hate speech and provide clearer explanations for its decisions.We propose utilizing CoT reasonings extracted from LLMs in two variants to overcome the logical gaps in human-annotated rationales.When fine-tuned on the SBIC and Implicit Hate datasets, our methods achieve superior detection performance and better qualified explanations.

Limitations
While we assess the quality of explanations generated by HARE using GPT-4, we do not conduct human evaluations, which are crucial for tasks requiring human-readable explanations.The primary reason for this omission is that the hate speech content and its respective explanations could be excessively offensive for annotators and GPT-4 already aligns with the level of inter-human agreement.In addition, the "verbosity bias", characterized by a preference for the longer text of GPT-4 as indicated by (Liu et al., 2023), may also serve as a limitation in our evaluation process.

Ethics Statement
Predicting whether an online post contains hatespeech is both technically and socially challenging.While methods for automating hatespeech detection have utility in an online platform, it is critical that these are tuned and used appropriately.
False-positive errors have potential to censor online speech, further marginalizing specific user groups, for example: use of n***** in AAVE English may be flagged.It is critical to understand specific reasoning behind a classification including deeply social reasons.While language models act as a mechanism to generate reasonable explanations, it is critical that they are used appropriately to prevent them from inadvertently educating users on how to craft more subtle and toxic language.We used automated evaluation metrics in this paper to prevent exposure of toxic language to human annotators.However, real-world usage would require validation that deeply rooted social issues are expressed correctly by these models.
It is also important to note that there might be concerns about the inherent bias in the GPT-3.5 model.While not flawless, GPT-3.5 has demonstrated its impartiality regarding gender, race, ethnicity, and religion by achieving the highest grade on the Harmfulness metric within the FLASK evaluation framework (Ye et al., 2023).Crucially, we only select rationales that align with the ground truth label for training, thereby mitigating biases not in sync with human annotators.Analysis of GPT-3.5-turbo can be found in Section 3 and Appendix D.2.Tommaso Caselli, Valerio Basile, Jelena Mitrović, and Michael Granitzer.2020.Hatebert: Retraining bert for abusive language detection in english.arXiv preprint arXiv:2010.12472.
In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML.

A Related Work
Hate Speech Detection Hate speech (Waseem et al., 2017) is a form of language designed to offend a particular individual or groups.In this study, we expand this definition by incorporating the broader concept of offensive language as in (Burnap and Williams, 2016;Ribeiro et al., 2018).Numerous recent works on hate speech detection have delved into providing underlying explanations of prediction on hate speech (Sap et al., 2019a,b;Mathew et al., 2021;ElSherief et al., 2021;Lin, 2022).One line on research focuses on keywordbased explanations (Sap et al., 2019a;Davidson et al., 2019;Mathew et al., 2021;Kim et al., 2022), but this approach often fails to capture implicit hatefulness that is not explicitly present in the text.Another approach involves explanations utilizing external knowledge sources (Sridhar and Yang, 2022; Lin, 2022), but these methods aim to solely improve classification performance.Yet another studies involve training generative models with humanwritten free-text rationales (Sap et al., 2019b;ElSherief et al., 2021;Huang et al., 2022) present in multiple benchmarks (Sap et al., 2019b;ElSherief et al., 2021).Nevertheless, due to the existence of logical gaps in these human-annotated rationales (Aggarwal et al., 2021;Sun et al., 2022), relying solely on these rationales results in sub-optimal detection and explanation quality.Our proposed HARE shows its effectiveness by incorporating LLM-generated rationales, which include logical completeness and abundant explanatory power extracted with our CoT prompting.
Self-Rationalization Self-rationalization, a technique where models provide explanations for their predictions, has been extensively studied to make models more understandable and transparent (Marasović et al., 2021;Wiegreffe et al., 2021a,b).

B Implementation Details
Training Details All the experiments utilize an Adafactor (Shazeer and Stern, 2018) optimizer with batch size of 32 and learning rate ∈ {5e-3, 5e-4, 5e-5} without learning rate scheduling.We train all models for 10 epochs and utilize the checkpoint with the highest validation F1 score to evaluate on the test set.Top-k sampling with k=20 is adopted as a decoding method.For dataset split, we use official dataset split for SBIC and random split with 6:2:2 ratio for train, validation, and test sets for Implicit Hate.All our experiments are conducted using a single NVIDIA A6000 GPU.

C Details on GPT-4 Evaluation
To conduct GPT-4 evaluation, we utilize the prompts proposed by Zheng et al. (2023).For single answer grading, the prompt format for GPT-4 is written as follows: [Instruction] Please act as an impartial judge and evaluate the quality of the response provided by an AI assistant to the user question displayed below.Your evaluation should consider factors such as the helpfulness, relevance, accuracy, depth, creativity, and level of detail of the response.Begin your evaluation by providing a short explanation.Be as objective as possible.After providing your explanation, please rate the response on a scale of 1 to 10 by strictly following this format: To compare the alignment of generated rationales with human-written rationales, we conduct a pairwise comparison of explanations against ground-truth rationales.These ground-truth rationales encompass the annotated target groups and implied statements from the instruction, and we compare the results.Additionally, to minimize bias from the order of candidate answers, we adopt the approach of Zheng et al. ( 2023), considering both original and swapped orders of predicted explanations.If C + T + I and Co-HARE are chosen alternately, it is deemed a tie.Should one method be selected following a tie, that method is considered as the chosen one.The prompt format for the pairwise comparison is provided below: [Instruction] Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user question displayed below.You should choose the assistant that follows the user's instructions and answers the user's question more accurate.When choosing the assistant, please consider the true answers below: Target: T Implied Statement: I Your evaluation should consider which response is more similar to the true answers.Begin your evaluation by comparing the two responses and provide a short explanation.Avoid any positional biases and ensure that the order in which the responses were presented does not influence your decision.Do not allow the length of the responses to influence your evaluation.Do not favor certain names of the assistants.Be as objective as possible.After providing your explanation, output your final verdict by strictly following this format: "  Figures 5,6,7,and 8 showcase results generated by the fine-tuned Flan-T5-large model using HARE and C+T +I, based on test samples from SBIC.Although a brief explanation is provided in Section 3.2, we delve deeper with an extended analysis of the 20 examples from our qualitative study.These 20 samples were randomly chosen in proportion to their correct and incorrect predictions across the different methods.
When comparing human-written annotations with HARE, it becomes evident that the annotated rationales in SBIC often take the form of implied statements, following a simple Hearst-like pattern (Sap et al., 2019b).Learning from such rationales, which are closely tied to the conclusion, creates a logical gap for the model and makes interpretation challenging for humans.For instance, understand-ing hate speech without background knowledge references, such as 'burlap sack', can make it difficult to see the connection between the statement "girls are not worthy of equal life" and the provided sentence.Figures 5 and 6 showcase successful cases where models have attempted to bridge this reasoning gap through HARE, offering more detailed rationales that encompass the context.Furthermore, these models exhibit capabilities not seen in previous research, such as detecting terms with historical significance (e.g., 'burlap sack' or 'Anne Frank') or common words that may carry hateful connotations (e.g., 'reds'), thus enhancing the intermediate reasoning process.
However, when examining the failure cases in Figures 7 and 8, the results show that HARE sometimes fails due to increased sensitivity to potentially harmful terms, thereby classifying them as offensive.While this increased sensitivity can be viewed as a drawback, there are instances, such as with the Alzheimer example, where an expression might be interpreted as hateful depending on the individual.This suggests that HARE aims to classify a post as hateful if it could be considered offensive to certain groups.Moreover, considering the David Bread Katz example, it is also challenging for HARE to decide if the post is offensive if it post with background that it hasn't encountered, possibly due to a lack of background knowledge regarding the implied shooting incident, illustrating the limitation of LLM distillation.

D.2 Qualitative Study on GPT-3.5 rationales
When comparing annotations with rationales generated by GPT-3.5, we observe that human-written rationales from SBIC use implied statements that follow simple Hearst-like patterns (Sap et al., 2019b).In contrast, models like LLM (e.g., GPT-3.5)tend to provide detailed, step-by-step explanations, often complemented by relevant social background information, which is immensely beneficial.For example, while earlier rationales might omit mentioning Bill Cosby's conviction of sexual assault, GPT-3.5 explicitly informs us of this fact, greatly enhancing comprehension.A particularly striking example is the "pet" case.While a human annotator perceived it as hate speech targeted at black individuals, GPT-3.5 points out that it could also be used derogatorily against white individuals, thereby emphasizing the potential biases in hate speech detection.
When rationales are categorized and structured, as seen in SBIC, instead of being tailored to individual posts, they may not be sufficient for learning implications.This could explain why, as suggested by Table 2, there's a decrease in generalization for C+T +I.Our approach offers aligned rationales for each post at minimal cost, enabling the learning of diverse reasons for potential hate, which in turn leads to enhanced generalization.

Figure 2 :
Figure 2: The result of GPT-4 evaluation following Zheng et al. (2023).The bar and line represent the average scores that range from 1 to 10 and 95% confidence interval, respectively.We utilize Flan-T5-large finetuned on SBIC using each method.

Figure 3 :
Figure 3: Model outputs using baseline methods and our framework HARE.The samples are from SBIC test set.Note that the answers are abbreviated by (...), and the full context is reported in Appendix D.

Figure 4 :
Figure 4: A sample of LLM rationale generated by GPT-3.5-turbo using Fr-HARE from SBIC train set.

Table 1 :
The performance of fine-tuning on SBIC and Implicit Hate dataset with various models and size.

Table 2 :
Cross Evaluation results on HateXplain