Understanding and Countering Stereotypes: A Computational Approach to the Stereotype Content Model

Stereotypical language expresses widely-held beliefs about different social categories. Many stereotypes are overtly negative, while others may appear positive on the surface, but still lead to negative consequences. In this work, we present a computational approach to interpreting stereotypes in text through the Stereotype Content Model (SCM), a comprehensive causal theory from social psychology. The SCM proposes that stereotypes can be understood along two primary dimensions: warmth and competence. We present a method for defining warmth and competence axes in semantic embedding space, and show that the four quadrants defined by this subspace accurately represent the warmth and competence concepts, according to annotated lexicons. We then apply our computational SCM model to textual stereotype data and show that it compares favourably with survey-based studies in the psychological literature. Furthermore, we explore various strategies to counter stereotypical beliefs with anti-stereotypes. It is known that countering stereotypes with anti-stereotypical examples is one of the most effective ways to reduce biased thinking, yet the problem of generating anti-stereotypes has not been previously studied. Thus, a better understanding of how to generate realistic and effective anti-stereotypes can contribute to addressing pressing societal concerns of stereotyping, prejudice, and discrimination.


Introduction
Stereotypes are widely-held beliefs about traits or characteristics of groups of people. While we tend to think of stereotypes as expressing negative views of groups, some stereotypes actually express positive views (e.g. all women are nurturing). However, even so-called 'positive' stereotypes can be harmful, as they dictate particular roles that individuals are expected to fulfill, regardless of whether they have the ability or desire to do so (Kay et al., 2013).
The existence of stereotypes in our society -including in entertainment, the workplace, public discourse, and even legal policy -can lead to a number of harms. Timmer (2011) organizes these harms into three main categories: (1) Misrecognition effects: harms caused by denying members of particular groups an equal place in society, diminishing their human dignity, or other forms of marginalization.
(2) Distribution effects: harms resulting from unfair allocation of resources, either by increasing the burden placed on a group, or decreasing a group's access to a benefit. (3) Psychological effects: the distress and unhappiness caused by an awareness and internalization of the stereotyped biases against one's identity group. Additionally, the internalization of these negative stereotypes can lead to anxiety and underachievement. To reduce these harms and promote a more egalitarian society, we must identify and counter stereotypical language when it occurs.
Evidence from the psychological literature suggests that one of the most effective methods for reducing stereotypical thinking is through exposure to counter-stereotypes, or anti-stereotypes. Finnegan et al. (2015) showed participants stereotypical and anti-stereotypical images of highly socially-gendered professions (e.g., a surgeon is stereotypically male, and a nurse is stereotypically female; the genders were reversed in the anti-stereotypical images), and then measured their gender bias in a judgement task. Exposure to antistereotypical images significantly reduced gender bias on the task. Blair et al. (2001) used a mental imagery task and reported that participants in the anti-stereotypical condition subsequently showed significantly weaker effects on the Implicit Association Test (IAT). Dasgupta and Greenwald (2001) showed a similar effect by exposing participants to anti-stereotypical exemplars (e.g. admired Black celebrities, and disliked white individuals). When Lai et al. (2014) compared 17 interventions aimed at reducing stereotypical thinking, methods involving anti-stereotypes were most successful overall.
Thus, creating technology that enables users to identify stereotypical language when it occurs, and then counter it with anti-stereotypes, could help to reduce biased thinking. However, the idea of what constitutes an anti-stereotype remains ill-defined. Is an anti-stereotype simply the semantic opposite of a stereotype? Or can anything that is not a stereotype serve as an anti-stereotype? If two groups are stereotyped similarly, do they have an identical antistereotype? Can an anti-stereotype actually reflect an equally harmful view of a target group (e.g. the cold-hearted career woman as an anti-stereotype to the nurturing housewife)?
Here, we begin to untangle some of these questions using the StereoSet dataset (Nadeem et al., 2020). We begin by analyzing the stereotypes expressed in this dataset. One widely-accepted model of stereotypes, prejudice, and inter-group relationships from social psychology is the "Stereotype Content Model" or SCM (Fiske et al., 2002). The SCM proposes two fundamental and universal dimensions of social stereotypes: warmth and competence. By defining the warm-cold, competentincompetent axes in the semantic embedding space, we are able to cluster and interpret stereotypes with respect to those axes. We can then examine the associated anti-stereotypes and their relation to both the stereotyped description and the target group. Thus, our contributions are as follows: • To develop a computational method for automatically mapping textual information to the warmthcompetence plane as proposed in the Stereotype Content Model.
• To validate the computational method and optimize the choice of word embedding model using a lexicon of words known to be associated with positive and negative warmth and competence.
• To compare the stereotypes in StereoSet with those reported in the survey-based social psychology literature.
• To analyze human-generated anti-stereotypes as a first step towards automatically generating antistereotypes, as a method of countering stereotypes in text with constructive, alternative perspectives.

Related Work
We provide more details on the Stereotype Content Model and its practical implications, and then briefly review the NLP research on computational analysis of stereotypical and abusive content. Stereotype Content Model: Stereotypes, and the related concepts of prejudice and discrimination, have been extensively studied by psychologists for over a century (Dovidio et al., 2010). Conceptual frameworks have emerged which emphasize two principle dimensions of social cognition. The Stereotype Content Model (SCM) refers to these two dimensions as warmth (encompassing sociability and morality) and competence (encompassing ability and agency) (Fiske et al., 2002). When forming a cognitive representation of a social group to anticipate probable behaviors and traits, people are predominantly concerned with the others' intentare they friends or foes? This intent is captured in the primary dimension of warmth. The competence dimension determines if the others are capable to enact that intent. A key finding of the SCM has been that, in contrast to previous views of prejudice as a uniformly negative attitude towards a group, many stereotypes are actually ambivalent; that is, they are high on one dimension and low on the other. Further, the SCM proposes a comprehensive causal theory, linking stereotypes with social structure, emotions, and discrimination (Fiske, 2015). According to this theory, stereotypes are affected by a perceived social structure of interdependence (cooperation versus competition), corresponding to the warmth dimension, and status (prestige and power), determining competence. Stereotypes then predict emotional response or prejudices. For example, groups perceived as unfriendly and incompetent (e.g., homeless people, drug addicts) evoke disgust and contempt, groups allegedly high in warmth but low in competence (e.g., older people, people with disabilities) evoke pity, and groups perceived as cold and capable (e.g., rich people, businesspeople) elicit envy.
Finally, the emotions regulate the actions (active or passive help or harm). Thus, low warmth-low competence groups often elicit active harm and passive neglect, whereas low warmth-high competence groups may include envied out-groups who are subjects of passive help in peace times but can become targets of attack during social unrest (Cuddy et al., 2007).
The SCM has been supported by extensive quantitative and qualitative analyses across cultures and time (Fiske, 2015;Fiske and Durante, 2016). To our knowledge, the current work presents the first computational model of the SCM.
Stereotypes in Language Models: An active line of NLP research is dedicated to quantifying and mitigating stereotypical biases in language models. Early works focused on gender and racial bias and revealed stereotypical associations and common prejudices present in word embeddings through association tests (Bolukbasi et al., 2016;Caliskan et al., 2017;Manzini et al., 2019). To discover stereotypical associations in contextualized word embeddings, May et al. (2019) and Kurita et al. (2019) used pre-defined sentence templates. Similarly, Bartl et al. (2020) built a template-based corpus to quantify bias in neural language models, whereas Nadeem et al. (2020) and Nangia et al. (2020) used crowd-sourced stereotypical and antistereotypical sentences for the same purpose. In contrast to these studies, while we do use word embeddings to represent our data, we aim to identify and categorize stereotypical views expressed in text, not in word embeddings or language models. Abusive Content Detection: Stereotyping, explicitly or implicitly expressed in communication, can have a detrimental effect on its target, and can be considered a form of abusive behavior. Online abuse, including hate speech, cyber-bullying, online harassment, and other types of offensive and toxic behaviors, has been a focus of substantial research effort in the NLP community in the past decade (e.g. see surveys by Schmidt and Wiegand (2017); Fortuna and Nunes (2018); Vidgen et al. (2019)). Most of the successes in identifying abusive content have been reported on text containing explicitly obscene expressions; only recently has work started on identifying more subtly expressed abuse, such as stereotyping and micro-aggressions (Breitfeller et al., 2019). For example, Fersini et al. (2018) and Chiril et al. (2020) examined genderrelated stereotypes as a sub-category of sexist language, and Price et al. (2020) annotated 'unfair generalizations' as one attribute of unhealthy online conversation. Sap et al. (2020) employed largescale language models in an attempt to automatically reconstruct stereotypes implicitly expressed in abusive social media posts. Their work showed that while the current models can accurately predict whether the online post is offensive or not, they struggle to effectively reproduce human-written statements for implied meaning.
Counter-narrative: Counter-narrative (or counterspeech) has been shown to be effective in confronting online abuse (Benesch et al., 2016). Counter-narrative is a non-aggressive response to abusive content that aims to deconstruct and delegitimize the harmful beliefs and misinformation with thoughtful reasoning and fact-bound arguments. Several datasets of counter narratives, spontaneously written by regular users or carefully crafted by experts, have been collected and analyzed to discover common intervention strategies (Mathew et al., 2018;Chung et al., 2019). Preliminary experiments in automatic generation of counter-narrative demonstrated the inadequacy of current large-scale language models for generating effective responses and the need for a human-inthe-loop approach (Qian et al., 2019;Tekiroglu et al., 2020). Countering stereotypes through exposure to anti-stereotypical exemplars is based on a similar idea of deconstructing harmful beliefs with counter-facts.

Data and Methods
We develop our computational SCM using labelled data from Nicolas et al. (2020) and the POLAR framework for interpretable word embeddings (Mathew et al., 2020), and then apply it to stereotype and anti-stereotype data from StereoSet (Nadeem et al., 2020). Details are provided in the following sections.

Warmth-Competence Lexicons
To construct and validate our model, we make use of the supplementary data from Nicolas et al. (2020) (https://osf.io/yx45f/). They provide a list of English seed words, captured from the psychological literature, associated with the warmth and competence dimensions; specifically, associated with sociability and morality (warmth), and ability and agency (competence). They then use WordNet to generate an extended lexicon of English words either positively or negatively associated with aspects of warmth and competence. Some examples from the seed data and extended lexicon are given in Table 1.

StereoSet
For human-generated stereotype and antistereotype data, we use the publicly-available portion of the StereoSet dataset (Nadeem et al., 2020). This English-language dataset was constructed to test language model bias, and part of the data is kept hidden as the test set for a leaderboard on language model fairness (https://stereoset.mit.edu/). Instead, we use the development set, which contains stereotype data for 79 target groups across four broad demographic domains: gender, race or nationality, profession, and religion.
In StereoSet, there are two experimental conditions: intra-sentence and inter-sentence. Here, we focus on the intra-sentence data only. The data was collected from crowd-workers as follows (see Nadeem et al. (2020) for more detail): Given a target group label, the annotator is asked to generate a stereotypical word associated with that group, as well as an anti-stereotypical word and an unrelated word. They then construct a context sentence containing the target group label, and a blank which can be filled with the stereotypical or antistereotypical word. For example, if the target group was women, the annotator might come up with emotional and rational as the stereotype and antistereotype words respectively, and then construct a sentence like Women are known for being overly BLANK . For our current analysis, we consider only the stereotype and anti-stereotype words, and discard the context sentence. We also exclude any targets that do not directly refer to groups of people (e.g., we discard Norway but keep Norwegian). This results in 58 target groups with an average of 25 stereotype and anti-stereotype word pairs each.

Constructing Warmth and Competence Dimensions
We consider several possible representations for the words in our dataset, including GloVe (Pennington et al., 2014), word2vec (Mikolov et al., 2013), and FastText (Mikolov et al., 2018). 1 In all cases, the key question is how to project the higherdimensional word embedding onto the warmthcompetence plane. Rather than using an unsupervised approach such as PCA, we choose the POLAR framework introduced by Mathew et al. (2020). This framework seeks to improve the interpretability of word embeddings by leveraging the concept of 'semantic differentials,' a psychological rating scale which contrasts bipolar adjectives, e.g. hot-cold, or goodbad. Given word embeddings that define these polar opposites for a set of concepts, all other word embeddings in the space are projected onto the 'polar embedding space,' where each dimension is clearly associated with a concept.
For our purposes, the polar opposites are warmth-coldness and competence-incompetence, as defined by the sets of seed words from Nicolas et al. (2020). To reduce the dimensionality of the space to 2D, we average the word vectors for all seed words associated with each dimension and polarity. That is, to define the warmth direction, we take the mean of all words in the seed dictionary which are positively associated with warmth. Given vector definitions for warmth, coldness, competence, and incompetence, we can then use a simple matrix transformation to project any word embedding to the 2D subspace defined by these basis vectors (mathematical details are given in Appendix A).

Model Validation
We first evaluate the model's ability to accurately place individual words from the lexicons along the  Table 2: Accuracy of the word embedding models on predicting the correct labels for the extended lexicon.
warmth and competence dimensions. We then explore whether we can reproduce findings describing where certain target groups are typically located in the warmth-competence plane, based on the previous survey-based social psychology literature.

Comparison with Existing Lexicons
As described above, we use the extended lexicon from Nicolas et al. (2020) to validate our model. We remove any words in the lexicon which appear in the seed dictionary and any words which do not have representations in all the pretrained embedding models, leaving a total of 3,159 words for validation.
In the extended lexicon, the words are annotated with either +1 or -1 to indicate a positive or negative association with the given dimension. We pass the same words through our system, and observe whether the model labels the word as being positively or negatively associated with the relevant dimension. Our evaluation metric is accuracy; i.e. the proportion of times our system agrees with the lexicon. Note that all words are associated with either warmth or competence, and therefore we can only evaluate one dimension at a time.
We evaluate a number of pre-trained word embeddings in the gensim library (Řehůřek and Sojka, 2010), with the results given in Table 2. The Fast-Text embeddings generally outperform the other embeddings on this task, with the 2M word model trained on 600B tokens in the Common Crawl leading to the highest accuracy. Therefore, we use this embedding model in the analysis that follows.

Comparison with Psychological Surveys
We now address the question of whether our model, in conjunction with the StereoSet data, is able to reproduce findings from psychological surveys. We project stereotypes from the StereoSet data onto the warmth-competence space for the 24 target groups that meet both of the following criteria: (1) they are included in the publicly available portion of the StereoSet data, and (2) they have been previ-ously studied for stereotyping in the psychological literature. Based on the findings from psychological surveys, we expect these target groups will be mapped to the following quadrants: 2 • Warm-Competent: nurse, psychologist ('healthcare professions') (Brambilla et al., 2010), researcher ('professor') (Eckes, 2002).
• Cold-Incompetent: African, Ethiopian, Ghanian, Eritrean, Hispanic (Lee and , Arab . To locate each target group on the plane, we generate word embeddings for each of the stereotype words associated with the target group, find the mean, and project the mean to the polar embedding space. As we aim to identify commonly-held stereotypes, we use a simple cosine distance filter to remove outliers, heuristically defined here as any words which are greater than a distance of 0.6 from the mean of the set of words. We also remove words which directly reference a demographic group (e.g., black, white) as these words are vulnerable to racial bias in the embedding model and complicate the interpretation. A complete list of the words in each stereotype cluster can be found in the Appendix B. Figure 1 confirms many of the findings predicted by the literature. Most (67%) of the stereotypes lie in the predicted quadrant, including grandfather and schoolgirl in the paternalistic warmincompetent quadrant; nurse and psychologist in the admired warm-competent quadrant, manager and male in the envied cold-competent quadrant, and African and Hispanic in the cold-cold quadrant.
Other stereotypes lie in locations which seem reasonable on examination of the underlying data. For example, while men are typically stereotyped as being competent yet cold in the psychological literature, the specific keyword gentlemen evokes a certain subset of men (described with words such as polite, respectful, and considerate), which ranks higher on the warmth dimension than the target word male (example words: dominant, aggressive).
We also observe that while children have generally been labelled as warm-incompetent in previous work (Fiske, 2018), this dataset distinguishes between male and female schoolchildren, and, as expected based on studies of gender, schoolboys are ranked as lower warmth than schoolgirls. The words used to describe schoolboys include references to the 'naughty' schoolboy stereotype, while the words describing schoolgirls focus on their innocence and naivety.
It is also notable that Arab, predicted to lie in the cold-incompetent quadrant, is here mapped to the cold-competent quadrant instead. We hypothesize that this is due to the use of stereotype words like dangerous and violent, which suggest a certain degree of agency and the ability to carry out goals. In contrast, the target group African as well as those associated with African countries are stereotyped as poor and uneducated, and thus low on the competence dimension.
In general, we conclude that in most cases the computational approach is successful in mapping stereotyped groups onto the predicted areas of the warmth-competence plane, and that the cases which diverge from findings in the previous literature do appear to be reasonable, based on an examination of the text data. Having validated the model, we can now apply it to the rest of the stereotype data in StereoSet, as well as the anti-stereotypes.

Stereotypes and Anti-Stereotypes
The SCM presents a concise theory to explain stereotypes and resulting prejudiced behaviour; however, it does not generate any predictions about anti-stereotypes. Here, we explore the antistereotypes in StereoSet within the context of the SCM, first at the level of individual annotators, and then at the level of target groups (combining words from multiple annotators). We then discuss how we might use information about warmth and competence to generate anti-stereotypes with the specific goal of reducing biased thinking.

Anti-Stereotypes in StereoSet
In this section, we investigate the question: What do human annotators come up with when asked to produce an anti-stereotype? One possibility is that they simply produce the antonym of their stereotype word. To test this hypothesis, for all 58 groups and each pair of stereotype and anti-stereotype words, we obtain a list of antonyms for the stereotype word using the Python library PyDictionary. We additionally search all the synonyms for the stereotype word, and add all of their antonyms to the list of antonyms as well. Then, if the lemma of the anti-stereotype matches the lemma of any of the retrieved antonyms, we consider it a match.
However, as seen in Table 3, the strategy of simply producing a direct antonym is only used 23% of the time. We consider four other broad possibilities: (1) that the annotator generates an anti-stereotype word that lies in the opposite quadrant from the stereotype word, e.g., if the stereotype word is low-competence, low-warmth (LC-LW), then the anti-stereotype word should be highcompetence, high-warmth (HC-HW); (2) that the annotator chooses a word with the opposite warmth polarity (i.e. flips warmth), while keeping the competence polarity the same; (3) that the annotator chooses a word with the opposite competence polarity (i.e. flips competence), while keeping the warmth polarity the same; (4) that the annotator chooses a word that lies in the same quadrant as the stereotype word. We report the proportion of times that each strategy is observed; first overall, then for each quadrant individually. The choice of whether to modify warmth or competence might also depend on which of those dimensions is most salient for a given word, and so we consider separately words for which the absolute value of competence is greater than the absolute value of warmth, and  Table 3: The percentage of times each of the hypothesized strategies of anti-stereotype generation is used for stereotypes, overall and in each quadrant. Quadrants are labelled as HC-HW, LC-HW, LC-LW, and HC-LW, where HC/LC denotes high/low competence, and HW/LW denotes high/low warmth. We also consider separately those stereotypes which have competence as the most salient dimension (|C| > |W|), and those which have warmth as the most salient dimension (|W| > |C|).
vice versa. The results are given in Table 3. While no single strategy dominates, we can make a few observations. In general, it is more likely that people select an anti-stereotype which is not a direct antonym, but which lies in the opposite quadrant in the warmth-competence plane. Flipping only one axis is less frequent, although we see in the last two columns that it is more likely that the competence will be flipped when competence is the salient dimension for a word, and similarly for warmth. Finally, choosing another word in the same quadrant is rare, but more common in the ambivalent quadrants.
While it is not possible to know what thought process the annotators followed to produce antistereotypes, we consider the following possible explanation. Just as we have here conceptualized a stereotype as being defined not by a single word, but by a set of words, perhaps each annotator also mentally represents each stereotype as a set of words or ideas. Then, the anti-stereotype word they produce sometimes reflects a different component of their mental image than the initial stereotype word. To give a concrete example from the data, one annotator stereotypes Hispanic people as aggressive, but then comes up with hygienic as an anti-stereotype, suggesting that unhygienic is also part of their multi-dimensional stereotype concept. The choice of whether to select a direct antonym, or whether to negate some other component of the stereotype, may depend on the availability of a familiar lexical antonym, the context sentence, or any number of other factors. In short, it appears that the process by which human annotators generate pairs of stereotype and anti-stereotype words is complex and not easily predicted by the SCM.
We then examine how these pairs of stereotype and anti-stereotype words combine to produce an overall anti-stereotype for the target group in question. Taking the same approach as in the previous  section, we average the anti-stereotype word vectors to determine the location of the anti-stereotype in the warmth-competence plane. For each target group, we then select the word closest to the mean for both the stereotype and anti-stereotype clusters.
Similarly to when we look at individual word pairs, in 22% of cases, the mean of the anti-stereotype is the direct antonym of the stereotype mean. In the other cases, 45% of the anti-stereotype means lie in the opposite quadrant to the stereotypes, in 16% of cases the warmth polarity is flipped, in 10% of cases the competence polarity is flipped, and in only 7% cases (4 target groups), the anti-stereotype lies in the same quadrant as the stereotype.
In Table 4, we offer a few examples of cases where the anti-stereotype means agree and disagree with the direct antonyms of the stereotypes. As in the pairwise analysis, in many cases the antistereotypes appear to be emphasizing a supposed characteristic of the target group which is not captured by the stereotype mean; for example, the anti-stereotype for 'dumb football player' is not smart, but weak -demonstrating that strength is also part of the football player stereotype. This is also seen clearly in the fact that two target groups with the same stereotype mean are not always assigned the same anti-stereotype: for example, both Africans and Hispanics are stereotyped as poor, but Africans are assigned the straightforward antistereotype rich, while Hispanics are assigned hard-working (perhaps implying that their poverty is due to laziness rather than circumstance).
The general conclusion from these experiments is that stereotypes are indeed multi-dimensional, and the anti-stereotypes must be, also. Hence it is not enough to generate an anti-stereotype simply by taking the antonym of the most representative word, nor is it sufficient to identify the most salient dimension of the stereotype and only adjust that. When generating anti-stereotypes, annotators (individually, in the pairwise comparison, and on average) tend to invert both the warmth and competence dimensions, taking into account multiple stereotypical characteristics of the target group.

Anti-Stereotypes for Social Good
The anti-stereotypes in StereoSet were generated with the goal of evaluating language model bias. Ultimately, our goal is quite different: to reduce biased thinking in humans. In particular, we want to generate anti-stereotypes that emphasize the positive aspects of the target groups.
As underscored by Cuddy et al. (2008), many stereotypes are ambivalent: they take the form 'X but Y'. Women are nurturing but weak, scientists are intelligent but anti-social. When we simply take the antonym of the mean, we focus on the single most-representative word; i.e., the X. However, in the examples we can observe that it's actually what comes after the "but ..." that is the problem. Therefore, in generating anti-stereotypes for these ambivalent stereotypes, we hypothesize that a better approach is not to take the antonym of the primary stereotype (i.e., women are uncaring, scientists are stupid), but rather to challenge the secondary stereotype (women can be nurturing and strong, scientists can be intelligent and social).
As a first step towards generating antistereotypes for such ambivalent stereotypes, we propose the following approach: first identify the most positive aspect of the stereotype (e.g., if the stereotype mean lies in the incompetent-warm quadrant, the word expressing the highest warmth), then identify the most negative aspect of the stereotype in the other dimension (in this example, the word expressing the lowest competence). Then the stereotype can be phrased in the X but Y construction, where X is the positive aspect and Y is the negative aspect. 3 To generate a positive anti-stereotype 3 A similar method can be used for warm-competent and cold-incompetent stereotypes, although if all words are positive, an anti-stereotype may not be needed, and if all words which challenges stereotypical thinking while not promoting a negative view of the target group, take the antonym only of the negative aspect. Some examples are given in Table 5. A formal evaluation of these anti-stereotypes would involve carrying out a controlled psychological study in which the anti-stereotypes were embedded in an implicit bias task to see which formulations are most effective at reducing bias; for now, we simply present them as a possible way forward.
As shown in the table, taking into account the ambivalent aspects of stereotypes can result in more realistic anti-stereotypes than either taking the mean of the crowd-sourced anti-stereotypes, or simply generating the semantic opposite of the stereotype. For example, the group grandfather is mostly stereotyped as old, and then counterintuitively anti-stereotyped as young. It is more useful in terms of countering ageism to combat the underlying stereotype that grandfathers are feeble rather than denying that they are often old. Similarly, it does not seem helpful to oppose biased thinking by insisting that entrepreneurs can be lazy, engineers and developers can be dumb, and mothers can be uncaring. Rather, by countering only the negative dimension of ambivalent stereotypes, we can create realistic and positive anti-stereotypes.

Discussion and Future Work
Despite their prevalence, stereotypes can be hard to recognize and understand. We tend to think about other people on a group level rather than on an individual level because social categorization, although harmful, simplifies the world for us and leads to cognitive ease. However, psychologists have shown that we can overcome such ways of thinking with exposure to information that contradicts those biases. In this exploratory study, we present a computational implementation of the Stereotype Content Model to better understand and counter stereotypes in text.
A computational SCM-based framework can be a promising tool for large-scale analysis of stereotypes, by mapping a disparate set of stereotypes to the 2D semantic space of warmth and competence. We described here our first steps towards developing and validating this framework, on a highly constrained dataset: in StereoSet, the annotators were explicitly instructed to produce stereotypical ideas, the target groups and stereotypical words are negative, then an antonym may be more appropriate.  are clearly specified, and every stereotype has an associated anti-stereotype generated by the same annotator. In future work, this method should be further assessed by using different datasets and scenarios. For example, it may be possible to collect stereotypical descriptions of target groups 'in the wild' by searching large corpora from social media or other sources. We plan to extend this framework to analyze stereotypes on the sentence-level and consider the larger context of the conversations. Working with real social media texts will introduce a number of challenges, but will offer the possibility of exploring a wider range of marginalized groups and cultural viewpoints. Related to this, we reiterate that only a portion of the StereoSet dataset is publicly available. Therefore, the data does not include the full set of common stereotypical beliefs for social groups frequently targeted by stereotyping. In fact, some of the most affected communities (e.g., North American Indigenous people, LGBTQ+ community, people with disabilities, etc.) are completely missing from the dataset. In this work, we use this dataset only for illustration purposes and preliminary evaluation of the proposed methodology. Future work should examine data from a wide variety of subpopulations differing in language, ethnicity, cultural background, geographical location, and other characteristics.
From a technical perspective, with larger datasets it will be possible to implement a cluster analysis within each target group to reveal the different ways in which a given group can be stereotyped. A classification model may additionally improve the accuracy of the warmth-competence categorization, although we have chosen the PO-LAR framework here for its interpretability and ease of visualization.
We also examined how we might leverage the developed computational model to challenge stereotypical thinking. Our analysis did not reveal a simple, intuitive explanation for the anti-stereotypes produced by the annotators, suggesting they ex-ploited additional information beyond what was stated in the stereotype word. This extra information may not be captured in a single pair of stereotype-anti-stereotype words, but by considering sets of words, we can better characterize stereotypes as multi-dimensional and often ambivalent concepts, consistent with the established view in psychology. This also allows us to suggest antistereotypes which maintain positive beliefs about a group, while challenging negative beliefs.
We propose that this methodology may potentially contribute to technology that assists human professionals, such as psychologists, educators, human rights activists, etc., in identifying, tracking, analyzing, and countering stereotypes at large scale in various communication channels. There are a number of ways in which counter-stereotypes can be introduced to users (e.g., through mentions of counter-stereotypical members of the group or facts countering the common beliefs) with the goal of priming users to look at others as individuals and not as stereotypical group representatives. An SCM-based approach can provide the psychological basis and the interpretation of automatic suggestions to users.
Since our methodology is intended to be part of a technology-in-the-loop approach, where the final decision on which anti-stereotypes to use and in what way will be made by human professionals, we anticipate few instances where incorrect (i.e., not related, unrealistic, or ineffective) automatically generated anti-stereotypes would be disseminated. In most such cases, since anti-stereotypes are designed to be positive, no harm is expected to be incurred on the affected group. However, it is possible that a positive, seemingly harmless antistereotypical description can have a detrimental effect on the target group, or possibly even introduce previously absent biases into the discourse. Further work should investigate the efficiency and potential harms of such approaches in real-life social settings.

Ethical Considerations
Data: We present a method for mapping a set of words that represent a stereotypical view of a social category held by a given subpopulation onto the two-dimensional space of warmth and competence. The Stereotype Content Model, on which the methodology is based, has been shown to be applicable across cultures, sub-populations, and time (Fiske, 2015;Fiske and Durante, 2016). Therefore, the methodology is not specific to any subpopulation or any target social group.
In the current work, we employ the publicly available portion of the StereoSet dataset (Nadeem et al., 2020). This English-only dataset has been created through crowd-sourcing US workers on Amazon Mechanical Turk. Since Mechanical Turk US workers tend to be younger and have on average lower household income than the general US population (Difallah et al., 2018), the collected data may not represent the stereotypical views of the wider population. Populations from other parts of the world, and even sub-populations in the US, may have different stereotypical views of the same social groups. Furthermore, as discussed in Section 6, the StereoSet dataset does not include stereotype data for a large number of historically marginalized groups. Future work should examine data both referring to, and produced by, a wider range of social and cultural groups. Potential Applications: As discussed previously, the automatically proposed anti-stereotypes can be utilized by human professionals in a variety of ways, e.g., searching for or creating antistereotypical images, writing counter-narratives, creating educational resources, etc. One potential concern which has not received attention in the related literature is the possibility that the process of generating counter-stereotypes may itself introduce new biases into the discourse, particularly if these counter-stereotypes are generated automatically, perhaps even in response to adversarial data. We emphasize the importance of using counterstereotypes not to define new, prescriptive boxes into which groups of people must fit (e.g., from Table 3, that all software developers should be intelligent and healthy, or that all entrepreneurs must be inventive and compassionate). Rather, counterstereotypes should weaken common stereotypical associations by emphasizing that any social group is not actually homogenous, but a group of individuals with distinct traits and characteristics. In most cases, the algorithm-in-the-loop approach (with automatic suggestions assisting human users) should be adopted to reduce the risk of algorithmic biases being introduced into the public discourse.
Often, harmful stereotyping is applied to minority groups. Work on identifying and analyzing stereotypes might propagate the harmful beliefs further, and it is possible that collections of stereotypical descriptions could be misused as information sources for targeted campaigns against vulnerable populations. However, this same information is needed to understand and counter stereotypical views of society. We also note that although we take advantage of word embedding models in our approach, we do not use the representations of target group names. Previous work has shown that biased thinking is encoded in these models, and using them to represent groups can be harmful to specific demographics.
Identifying Demographic Characteristics: The proposed methodology deals with societal-level stereotypical and anti-stereotypical representations of groups of people and does not attempt to identify individual user/writer demographic characteristics. However, work on stereotyping and anti-stereotyping entails, by definition, naming and defining social categories of people. Labeling groups not only defines the category boundaries, but also positions them in a hierarchical social-category taxonomy (Beukeboom and Burgers, 2019). We emphasize that our goal is not to maintain and reproduce existing social hierarchies, as cautioned by Blodgett et al. (2020), but rather to help dismantle this kind of categorical thinking through the use of anti-stereotypes.
Energy Resources: The proposed SCM-based method is computationally low-cost, and all experiments were performed on a single CPU. Once the pretrained vectors are loaded, the projection and analysis is completed in less than a minute.

A Constructing POLAR dimensions
In contrast to the standard POLAR framework introduced by Mathew et al. (2020), we do not have a set of polar opposite word pairs, each representing a different interpretable dimension, but rather a set of words for each of the concepts warmth, coldness, competence, and incompetence from Nicolas et al. (2020). Therefore, we use a slightly different formulation to obtain the polar directions associated with warmth and competence. 4 ∈ R V ×d denote the set of pretrained d-dimensional embedding vectors, trained with algorithm a, where V is the size of the vocabulary and − → W a i is a unit vector representing the i th word in the vocabulary.
In this work, we use four sets of seed words; a set of N 1 words associated with positive warmth P w+ = {p 1 w+ , p 2 w+ , ..., p N 1 w+ }, a set of N 2 words associated with negative warmth, P w− = {p 1 w− , p 2 w− , ..., p N 2 w− }, a set of N 3 words associated with positive competence, P c+ = {p 1 c+ , p 2 c+ , ..., p N 3 c+ }, and a set of N 4 words associated with negative competence, In order to find the two polar opposites, we obtain the following directions: where W a υ represents the vector of the word υ. The two direction vectors are stacked to form dir ∈ R 2×d , which represents the change of basis matrix for the new two-dimensional embedding subspace E. In the new subspace, a word υ is represented by − → E υ , which is calculated using the following linear transformation: In our experiments, we showed that, as expected, the two dimensions of E are associated with warmth and competence.

B Stereotype Data
Here we present all of the words contributing to each stereotype for each target group. In addition to 37 tokens which did not have vector representations in the pre-trained embeddings, here were two reasons why words were discarded from the analysis, as described in the paper. First, if a word directly referenced another demographic category, it was discarded. This was to avoid, as much as possible, including effects of language model bias in our model. For example, a number of annotators used the word black to describe prisoners; however, if the language model has some racial bias involving the word black, then it would affect the placement of the word prisoner on the warmth-competence plane. While acknowledging that stereotypical associations between groups are problematic and worth of study in their own right (including this disturbing example involving race and incarceration), it is beyond the scope of the current analysis.
Additional words were discarded in a filtering step, where words greater than a cosine distance of 0.6 from the mean or centroid of the group were discarded. As people's views towards different groups naturally vary, this was done to prevent outlier words from impacting the analysis, which is focused here on the most widespread or prevalent stereotypes of a given group. While heuristically chosen, the threshold value appears to be acceptable in many cases (see Table B.1). However, other times a large number of words are discarded, which appear in some cases to represent a second, coherent cluster of stereotype words for a given group (see, for example, the words for policeman, which split into two clusters alternately characterizing the group as corrupt and racist, or strong and heroic). As mentioned in the discussion, future work will examine how we can identify clusters of stereotypes in larger datasets.
In the following table, for each target group we present three lists of words: • Included: the words included in the analysis.
This list is ranked according to distance to the mean, and thus most 'representative' words occur first. Words which occur more than once have the frequency given in parentheses. • Discarded: the words discarded as outliers by the thresholding step. • Demographic: the words discarded as referring primarily to a demographic characteristic.  Table B.1: Target groups and associated stereotype words in StereoSet. Words which occur more than once for a given group have their frequency indicated in parentheses. Words that are included in the analysis are ranked by closeness to the cluster mean; thus the first words in the list are most representative of the stereotype for that group.