"A Tale of Two Movements": Identifying and Comparing Perspectives in #BlackLivesMatter and #BlueLivesMatter Movements-related Tweets using Weakly Supervised Graph-based Structured Prediction

Social media has become a major driver of social change, by facilitating the formation of online social movements. Automatically understanding the perspectives driving the movement and the voices opposing it, is a challenging task as annotated data is difficult to obtain. We propose a weakly supervised graph-based approach that explicitly models perspectives in #BackLivesMatter-related tweets. Our proposed approach utilizes a social-linguistic representation of the data. We convert the text to a graph by breaking it into structured elements and connect it with the social network of authors, then structured prediction is done over the elements for identifying perspectives. Our approach uses a small seed set of labeled examples. We experiment with large language models for generating artificial training examples, compare them to manual annotation, and find that it achieves comparable performance. We perform quantitative and qualitative analyses using a human-annotated test set. Our model outperforms multitask baselines by a large margin, successfully characterizing the perspectives supporting and opposing #BLM.


Introduction
Social media platforms have given a powerful voice to groups and populations demanding change and have helped spark social justice movements.The platforms provide the means for activists to share their perspectives and form an agenda, often resulting in real-world actions.Such movements can also lead to the formation of reactionary movements, aiming to counter the call for change.In this paper we suggest a computational framework for analyzing such instances of public discourse, specifically looking at the interaction between #BlackLivesMatter and its reaction -#BlueLivesMatter.
The first movement was formed in 2013 in response to the acquittal of George Zimmerman of killing of Trayvon Martin, an unarmed Black teenager.While support for the movement fluctuated (Parker et al., 2020) and varied across different demographic groups (Horowitz and Livingston, 2016), since May 2020, following the murder of George Floyd, support has increased across all U.S. states and demographic groups (Kishi and Jones, 2020;Parker et al., 2020), changing public discourse (Dunivin et al., 2022) and resulting in widespread protest activities (Putnam et al., 2020).At the same time, these activities have attracted counter movements (Gallagher et al., 2018;Blevins et al., 2019).While #BlackLivesMatter protested against police violence towards Black individuals, the counter movement, referred to using its most prominent hashtag, #BlueLivesMatter, emphasized the positive role and need for law enforcement, often painting protesters as violating the law.
To account for the differences in perspectives between the two movements in a nuanced way, we suggest a structured representation for analyzing online content, corresponding to multiple tasks, such as capturing differences in stances, disambiguating the entities each side focuses on, as well as their role and sentiment towards them.We discuss our analysis in Sec.2.2.From a technical perspective, our main challenge is to model the dependency among these tasks.For example, consider the use of the entity "thugs" in Fig. 1  different authors.Understanding the perspectives underlying the texts' stance towards #BlackLives-Matter, requires disambiguating it as either refering to police officers, or to BlackLivesMatter supporters.We model such interaction among tasks using a graph-based representation, in which authors, their posts and its analysis are represented as nodes.We use a graph neural network (Schlichtkrull et al., 2018) to model the interaction among these elements, by creating graph-contextualized node representations, which alleviate the difficulty of text analysis.We discuss our graph representation in Sec.3.1 and its embedding in Sec.3.2.Social movements are dynamic, as a result, using human-annotated data for training is costly.Hence, we explore learning from weak supervision, initialized with artificially generated data using Large Language Models (LLMs) (Brown et al., 2020) and amplified using self-training, exploiting the graph structure to impose output consistency.We discuss self-learning in Sec.3.5.To evaluate our model and the quality of artificial data, we human annotate ∼ 3k tweets by both sides for all analysis tasks.We compare our graph-based approach with multitask and discrete baselines in two settings -direct supervision (train with humanannotated data) and weak supervision (train with artificial data).Our results on human-annotated test set consistently demonstrate the importance of social representations for this task and the potential of artificially crafted data.We also present an aggregate analysis of the main perspectives on 400k tweets related to the George Floyd protests.

BLM Data Collection and Analysis
First, we describe the data collection process for the #BLM movement.Then, we define the structured perspective representation used for analyzing it.

Dataset
For studying perspectives on the BLM movement we use the dataset collected by Giorgi et al., 2022.This dataset contains tweets in various languages on the BLM protest and the counter-protests.This dimensions without direct supervision, we rely on external tool and report the results at an aggregate.
Entity Disambiguation.The focus on specific entities is central to the movements' messaging.However, identifying it directly from text can be challenging (e.g., in Fig. 1, 'thugs' map to both 'police' or 'BLM Supporters' depending on the speaker).To alleviate this difficulty, we identify the key abstract entities discussed by both sides using human-in-the-loop data analysis at an aggregate level (listed in Tab. 2).The description of abstract entities and the procedure for identifying them can be found in Appx. A. We define an entity disambiguation task, mapping the raw entities that appear in tweets to the intended abstract entity.

Weakly Supervised Identification of Perspectives
In this section, we propose a self-supervised modeling approach for a holistic identification of perspectives and stances on #BlackLivesMatter.

Structured Representation of Text
To jointly learn perspectives and identify stances on #BLM, we define perspectives as how entities are portrayed by an author as described in Sec. 2. To identify perspectives towards an entity in a tweet we need to identify - (1) the abstract entity the targeted entity maps to, and (2) the role assigned and sentiment expressed towards the entity.Our insight is that the entity mapping, entity role/sentiment, and author stance all are interdepen-dent decisions.For example, if an entity maps to the abstract entity "police" and the author's stance is "pro-BlackLM", then the sentiment towards the entity will most likely be "negative" and the entity role will be "actor".Similarly, if an author is identified to be repeatedly mentioning "police" as a negative actor, then an unknown entity mentioned by the same author, that is identified to have a role of a "negative actor" will most likely address "police".
To explicitly model these dependencies between authors' stances and perspectives, we first convert the large unlabeled tweet corpus (in Tab. 1) to a graph consisting of textual elements (text, entity mentions, hashtags).Then we connect the textual graph to the social context (author retweet graph).
A unit representation of our multi-relational graph is shown in Fig. 2 and various nodes and relations among them in this graph are described below.
(1) Author-tweets-Tweet: An author node is connected to the tweet nodes they tweeted.
(2) Author-retweets-Author: Author nodes can be connected to each other with a retweet relationship.
(3) Author-uses-Keyword: An author node can be connected to a keyword node that is mentioned in their profile description.Keywords are meaningful hashtags and hashtag-like phrases.For example, "Black Lives Matter" is a phrase, and "#blacklivesmatter" is a similar hashtag.
(4) Hashtag-used-in-Tweet: A hashtag node is connected to the tweet node it is used in.
(5) Entity-mentioned-in-Tweet: An entity node is connected to the tweet node it is mentioned in.

Learning Representation using GCN
After converting text to graph and connecting it with the social network of authors, we learn the representation of the graph elements using a multirelational Graph Convolutional Network (R-GCN) (Schlichtkrull et al., 2018), an adaptation of traditional GCN (Kipf and Welling), where activations from neighboring nodes are gathered and transformed for each relation type separately.The representation from all relation types is gathered in a normalized sum and passed through an activation function (ReLU) to get the final representation of the node.A 2-layer R-GCN is sufficient in our case to capture all dependencies, resulting in a richcomposite representation for each node.

Predictions on Learned Representation
We define a set of prediction tasks on the learned representation of the nodes using R-GCN.These prediction tasks help infer various labels for the graph elements and at the same time maintain consistency among various types of decisions.In the following objectives, E represents learnable representation of a node using R-GCN, H represents trainable fully connected layers, σ represents Softmax activation, and ⊕ means concatenation.
Tweet stance prediction: Intuitively the stance in a tweet depends on its author.Hence, for a tweet, t, and its author, a, we define the classification task of tweet stance (pro-BlackLM/pro-BlueLM) as follows which ensures consistency among author stances and their generated tweet stances. ŷtweet Entity sentiment, role, and abstract entity label prediction: The sentiment and role of a given entity depend on the tweet text it is mentioned in.Given an entity, e, and the corresponding tweet, t, we define the classification task of the entity sentiment (pos./neg.)and role (actor/target) as follows.
Intuitively, the abstract entity label does not directly depend on the tweet text rather it depends mostly on its textual properties and indirectly on the sentiment/role assigned to it.We define the abstract entity label (out of 11 entities in Tab. 2) prediction by conditioning only on the entity representation.
Additionally, to maintain consistency between stance and perspectives towards entities (e.g.pro-BlackLM and "police neg.actor" are consistent), we define a prediction task of stance on the learned representation of entities and tweets as follows.

Entity Role and Sentiment Priors
The "mentioned-in" relationship between an entity and the tweet it is mentioned in, may have multiple types e.g., pos-actor, neg-actor, and pos-target, based on the sentiment expressed and the role assigned to this entity.We initialize the priors of this relationship type in our graph using two off-theshelf classifiers, C sent and C role , for sentiment and role classification, respectively, trained on out-ofdomain data (OOD).They are defined as follows.
We align predictions from C sent , C role (on the large unlabeled corpus) to the "mentioned-in" edges using the following loss functions.
Here, E 0 e is the non-trainable input representation of an entity e and m represents L1 loss.These learning objectives set up the priors for entity sentiment and role in our framework.In course of training, these priors get updated as the parameters of the classifiers C sent , C role are also updated.
We define the final loss function L as the sum of the alignment losses, L sa and L ra and all prediction losses described above, generally denoted as P (ŷ, y) where ŷ is the predicted label, y is the gold label, and P is CE loss.We optimize L by updating all trainable parameters including R-GCN.

Self Learning Loop
We propose a self-learning procedure where our model checks consistency among all predictions after every k learning steps which we name the "Inference step".The model adds elements (e.g., tweet, entity) with consistent labels found in this step, to its training set for the next iterations.In this manner, our model is able to start with only a few annotated examples as a training set and keep annotating data and increasing training examples in a continuous learning loop.We permit k epochs after each inference step, I j , so that the model parameters are learned based on the new training examples inferred at the previous inference step, I j−1 .The model identifies consistent labels using the following consistency checks sequentially.
(1) Label confidence: A label is reliable if predicted with a minimum confidence of c ∈ [0, 1].Only reliable labels are used in the next checks.
(2) Tweet consistency: A tweet is consistent if the predicted stance of the tweet is consistent with the perspectives identified towards all mentioned entities in the tweet (determined from Table 2).For example, a tweet stance pro-BlackLM is consistent with a mentioned entity in the tweet identified as "police-neg-actor" and inconsistent with an entity predicted as "police-pos-target".
(3) Author consistency: An author is consistent if all of their consistent tweets (identified in (2)) are labeled to have the same stance (pro-BlackLM or pro-BlueLM) and they have at least t such tweets.
Tweets from the consistent authors and the corresponding consistent entities are added to the train-ing set for the next learning steps.We also keep training the off-the-shelf sentiment and role label classifiers (C sent , C role ) at each step, using the inferred training set by our model so that they are up-to-date.We keep running our model until the model does not generate more than y% new training data for consecutive x inference steps.

Training and Test Data Collection
Detecting perspectives in social media texts using supervised models requires costly humanannotated training data.Hence, being inspired by the recent advances in generative Large Language Models (LLMs) (Min et al., 2021), we artificially craft training examples using LLMs.To evaluate the quality of the crafted training data and to evaluate our model's performance in real test data, we human annotate a subset of real data.
Artificial Training Data Generation: For generating tweets containing specific perspectives, we prompt LLMs (Brown et al., 2020) in a way that all of the structured elements for perspectives are present in the generated tweets.For example, to generate tweets that are "pro-BlackLM" and portray "Black Americans" as "positive target", we prompt LLMs to generate N tweets that meet 3 conditions -(1) explicitly mention one or more entities from ('Black man', 'Black people', 'Black woman', etc.) and portray them as positive targets, (2) are in the support of the "Black Lives Matter" movement, (3) use one or more hashtags from (#blacklivesmatter, #blm, etc.).We follow this prompt structure for each (stance, abstract entity, perspective) tuple in Tab. 2. We find that the LLM-generated tweets are pretty realistic.We convert these artificial training examples to graph structures as before.For each (stance, abstract entity, perspective) pair we construct an imaginary author node in the graph whose embeddings are initialized by averaging the corresponding artificial tweets.In this manner, we get the same unit structure for the artificial tweets as the real tweets.We experiment with two LLMs, GPT-3 (Brown et al., 2020) and GPT-J-6B (Wang and Komatsuzaki, 2021  human annotate a subset of the real data.We sample 200 authors from the corpus described in Tab. 1 and human-annotate these authors and their tweets for stances (pro-BlackLM/pro-BlueLM).We also human-annotate the entities they mention in their tweets for sentiment towards them (pos./neg.),their role (actor/target), and their mapping to the abstract entities (one of the 11 abstract entities in Tab. 2).Each data point is annotated by two human annotators and we find substantial to almost perfect agreement in all the annotation tasks.We resolve disagreements between the two annotators in all of the annotation steps by discussion.We observe that often the supporters of the movements "hijack" (Gallagher et al., 2018) the opponent's hashtags to troll or criticize them.We annotate these tweets as "Ambiguous Tweets" and are identified by looking at keyword usage (e.g., when a pro-blackLM tweet uses the keyword "bluelivesmatter"  Baselines: Our task of the joint identification of perspectives in terms of sentiment toward main actors, their roles, their disambiguation, and stance identification, is unique and to the best of our knowledge, no other works studied unified perspective detection in such a setting.Hence, the closest baseline of our model is the multitask modeling approach (Collobert and Weston, 2008).We compare our model with a RoBERTa-based multitask approach where a shared RoBERTa encoder is used and fine-tuned for the identification of entity sentiment, role and mapping, and tweet stances.
To compare it with our proposed model that incorporates author social-network interaction, we enhance the multitask RoBERTa by concatenating social-network-enhanced author embeddings with text representations.We also compare with discrete RoBERTa-based text classifiers where all tasks are learned separately by fine-tuning a separate RoBERTa encoder for each task.We also compare with the keyword-matching-based baseline for stance classification (Giorgi et al., 2022).For a fair comparison, we pretrain all sentiment and role classification baselines on the same outof-domain data by Roy et al., 2021 that we use for pertaining C sent and C role .Details of the baselines and their hyperparameters are in Appx.D.4.

Results
We present all individual classification results in Tab. 4. Our first observation is that the identifi-  In multitask setup, adding author embeddings improves performance a bit in some tasks.
We study our model's performance in four settings (more details in Appx.D.5).We observe that by just converting text to graph we get an improvement over discrete text classifiers over initial text embeddings (from RoBERTa-tapt).It proves the advantage of studying texts as graphs consisting of structured elements and performing joint structured prediction on that.Next, we obtain a large gain in tweet stance classification performance when the author network is added to the text-only graph.Finally, adding the self-learning loop improves the performance a lot in all tasks and outperforms all baselines proving the effectiveness of our approach.
The trends are mostly the same in the weak su- pervision setup and it achieves overall comparable performance with the direct supervision setup.
In this setup, adding author information does not help in ambiguous tweet stance detection, as in this setup, the authors are not real but rather imitated and their embeddings are just the average of the generated tweets.Also, ambiguous tweet examples are not present in the LLM-generated train set.We present the combined perspective identification results in Tab. 5 and observe that our model outperforms the multitask baseline in almost all perspectives.We observe low F1 scores for a few perspectives such as "Black Americans neg actor", "BLM pos target", and "Republicans pos actor".In the direct supervision setup, we find that "Black Americans neg actor" (Prec.: 60%, Recall: 2.7%) and "Republicans pos actor" (Prec.: 58%, Recall: 12%) get overall low average recalls.We conjecture being a less popular perspective (as shown in the support column) is the reason for low recalls.We find that 48% of the time "Republicans" wrongly map to the abstract entity "Government".In the timeline of this dataset, a Republican president was in power in the US.Hence, the framing of Republicans was probably as government figures.In the case of "BLM", 70% of the time it is mapped to a related abstract entity, "Black Americans".
Finally, we evaluate how much GPT-3 generated or human-annotated data is required for effective training of the models and find that our model is less sensitive to the number of training examples in both cases compared to the multitask baselines.The learning curves can be observed in Appx.D.8.

Qualitative Evaluations
We infer perspectives and stances in the whole corpus (in Tab. 1) using our model and the multitask baseline and perform the following qualitative evaluations (more evaluation details are in Appx.E).Correlation of author behavior with stance: We examine the correlation between authors' following and sharing behavior on Twitter with their stances  on #BLM movement.We find that there are positive correlations between being pro-BlackLM and following Democrats and being pro-BlueLM and following Republicans on Twitter.We find similar correlations in sharing behavior of the authors from left and right-biased media outlets and supporting BlackLM and BlueLM, respectively.These correlations are consistent with these political camps' stances on #BLM.However, this information is not used in the learning phase, hence, it serves as an extrinsic evaluation for our model on the whole corpus.We observe that these correlations increase using our model's labels compared to baselines as shown in Fig. 3, validating our models' predictions.Entity mapping analysis: Depending on the author's stance, the same phrases are often used to address different entities as shown in Tab. 6. Pro-BlackLM addresses police as "thugs" while pro-BlueLM addresses Antifa as "thugs".When pro-BlackLM tweets mention "lives", they mean Black lives while the pro-BlueLM means the lives of police officers.These patterns are better captured by our model compared to baselines (comparison and example tweets are shown in Appx.E.2). Discourse of the movements: In Tab. 7, we summarize some high PMI perspectives identified in  each of the movements, the corresponding moral foundations used in context of these perspectives, and the other perspectives that frequently appear in the same tweet (full list is in Appx.E.3).We infer moral foundation labels in tweets using an off-theshelf classifier.We find that the perspectives can explain the discourse of the movements.For example, in pro-BlackLM when police are portrayed as negative actors, Black Americans are portrayed as positive targets, and BLM movement as a positive actor.Moral foundations fairness/cheating and authority/subversion are used in this context.In contrast, in pro-BlueLM, police are portrayed as positive actors and targets, and in the same context, Antifa is portrayed as a negative actor.The moral foundation of loyalty/betrayal is used in context.However, high-level moral foundations in both of the camps are sometimes similar (e.g., care/harm is frequently used with all perspectives) and entity perspectives resolve the ambiguity in those cases.
Temporal trend: Using our model (direct supervision), we find that 20% of the tweets are identified as pro-BlueLM and the rest as pro-BlackLM.We find that first responders following George Floyd's killing were the pro-BlackLM camp.In contrast, the percentage of pro-BlueLM tweets per day slowly increased over time (Fig. 4a).When the protest started (on May 26) following George Floyd's death, the pro-BlueLM camp initially portrayed BLM and Antifa as negative actors (Fig. 4b) and communities to be positive targets or sufferers of the BLM movement (Appx.: Fig. 8a).As the movement progressed (after June 2) Democrats were portrayed as negative actors more equally.Additional trends can be found in Appx.E.4.

Related Works
The discourse of the #BLM movement is mostly studied in computational social science (CSS) using keyword-based analyses (De Choudhury et al., 2016;Gallagher et al., 2018;Blevins et al., 2019;Giorgi et al., 2022).However, it is understudied in NLP.Early works studied the problem of identifying the names of civilians in news articles killed by police using EM-based (Keith et al., 2017) and Deep Learning approaches (Nguyen and Nguyen, 2018).Recently, Ziems and Yang, 2021 introduced a news corpus covering 7k police killings to study entity-centric framing of victims, defined as the demographics and other status (e.g., armed/unarmed).
A shared task was proposed for identifying BLMcentric events from large unstructured data sources (Giorgi et al., 2021b), andGiorgi et al., 2022 introduced a large #BLM tweet corpus paving the way for more studies in this area.In this paper, we propose a holistic learning framework for understanding such social movements.

Conclusion
In this paper, we propose a weakly-supervised selflearned graph-based structured prediction approach for characterizing the perspectives and discourses on the #BlackLivesMatter and the #BlueLivesMatter movements on social media.We evaluate our model's performance in a human-annotated test set and find a significant improvement over all baselines.Finally, using our model we successfully analyze and compare perspectives expressed in both of the movements.

Limitations
For the artificially crafted training data generation, we mostly use GPT-3 which is not open source and is available as a paid service1 .Although our model depends only on a few GPT-3 generated texts (cost us below $3 USD combined), generating examples at a very large scale will be expensive using GPT-3.Experimenting with the increasing number of opensource LLMs is costly in a different way as they require advanced computing resources to mount and run.Hence, we leave the study on the difference in training data generated by various LLMs as future work.
Our main focus in this paper is to develop a holistic framework that can be applied to different events related to social movements for characterizing perspectives.As a result, in this paper, we focus on one significant event related to the #BLM movement which is the outrage right after George Floyd's killing.However, our model can be applied in the case of other similar social movements that were viral on social media such as the #MeToo movement.We study two opposing movements named #BlackLivesMatter and #BlueLivesMatter in this paper.Extending this study to another relevant and potentially more ambiguous slogan, #All-LivesMatter can be interesting.
Our model depends on pre-identified abstract entities and perspectives as priors.This preidentification of abstract entities and perspectives is a semi-automated process with human-in-the-loop or requires world knowledge.Making this step fully automatic or extracting them from an existing database can be interesting future work.
Our model does not automatically identify new abstract entities.However, in real life, new abstract entities may appear over time.This limitation does not affect our study in this paper because the study is done in a short time frame of one month and emerging of new abstract entities in this short time frame is unlikely.Extending the model to identify new abstract entities in a temporal fashion is our intended future work.Models (LLMs) in recent works (Brown et al., 2020;Blodgett et al., 2020).As LLMs are pretrained on large corpus of human-generated data, they may contain human bias in them.We want to clarify that, in this paper, we use LLMs to generate biased texts that contain specific perspectives and stances.Hence, the concerns regarding different types of biases (e.g.racial, national, gender, and so on) are not applicable in the case of our study.Because we prompt LLMs to generate only a few texts that have specific structured properties (e.g., stance, sentiment towards entities), and as described in Appendix B, one author of this paper manually went through the generated examples to detect inconsistencies or any unexpected biases in the generation and no unexpected bias was observed.We believe prompting LLMs using the structured way that we propose in this paper is effective in avoiding any inconsistencies that may be additionally incorporated by the pre-trained models.
Human Annotation: We did human annotation of data using in-house annotators (aged over 21) and the annotators' were notified that the texts may contain sensitive phrases.The detailed annotation process and inter-annotator agreement scores are discussed in Appendix C.
Bias and Ethics: In this paper, We carefully addressed all communities of people and movements that appear in the dataset.We made sure that every entity is addressed with due respect.Any additional suggestions from the reviewers on this issue will be much appreciated.All of the sentiments, perspectives, and trends reported in this paper are outcomes of the models we developed and implemented and in no way represent the authors' or the funding agencies' opinions on this issue.
Datasets Used: All of the datasets used in this paper are publicly available for research and we cited them adequately.
Code and Dataset Release: We submit a randomly sampled small subset of GPT-3 generated and human-annotated dataset and the model implementation scripts with this manuscript for review.We will publish the whole set, and all the implementation codes upon acceptance.We are aware of the Twitter privacy policy for publishing data and we will publicly release the datasets following it.

A Identification of Abstract Entities and Corresponding Perspectives
To identify the most common high-level abstract entities or the main actors in the pro-BlackLM and pro-BlueLM tweets we follow the below steps.
• We first determine the hashtags that are mostly used in the context of the keywords "blacklivesmatter", "bluelivesmatter" and we assign soft labels of "pro-blacklivesmatter" and "probluelivesmatter" to the tweets containing these hashtags, respectively.The identified "problacklivesmatter" and "pro-bluelivesmatter" hashtags are shown in Table 8.
• We extract noun phrases in these tweets using SpaCy.We treat these noun phrases as entities.
We remove all entities that are pronouns for this analysis.
• We build two separate RoBERTa-based (Liu et al., 2019) classifiers to identify sentiment towards entities (positive/negative) and the role of the entities (actor/target).These RoBERTa classifiers are trained using out-of-domain data.We use the annotated dataset by Roy et al., 2021 where entities are labeled for sentiments and roles.We consider the entity labels ["target of care/harm", "target of fairness/cheating", "target of loyalty/betrayal", "target of sanctity/degradation", "authority failing over", "authority justified over"], in this dataset, as "targets" and the rest as "actors".We obtain the contextualized embeddings of entities in a tweet using RoBERTa by taking the embeddings of the last layer.Then we use a fully connected layer to identify the sentiment or role of the entity.We use 80% data as training and the rest as validation set.
We stop training the model when the validation accuracy does not improve for 3 consecutive epochs.We use these trained classifiers to infer the sentiment and role of entities in the #BLM corpus.The validation accuracies of both of these classifiers on out-of-domain validation sets were > 92%.
• After obtaining entities, their corresponding roles, and sentiments towards them, we construct perspectives as "entity_sentiment_role", such as "police_positive_actor".Now, we obtain Pointwise Mutual Information score (PMI) (Church and Hanks, 1990) for each perspectives with the "pro-blacklivesmatter" and "pro-bluelivesmatter" stances using the following formula.For a perspective x we calculate the Pointwise Mutual Information (PMI) with a stance s, I(x, s) using the following formula.
I(x, s) = log P (x|s) P (x) Where P (x|s) is computed by taking all perspectives used in tweets with stance s and computing count(x) count(all−perspectives) and similarly, P (x) is computed by counting perspective x over all of the tweets.We discard all perspectives that appear less than 0.5% of the time in the whole corpus.
• We manually go through the high-PMI perspectives with each stance and cluster them to form abstract entities and perspectives.The high-PMI perspectives are shown in Table 8.
We cluster entities and perspectives that are consistent and directed to the same set of entities.For example, in pro-bluelivesmatter tweets, "a riot_neg_actor" and "# blacklivesmatter # protests_neg_actor" are directed to the same high-level entity "BLM Movement" and express the same perspectives.Hence, we merge these entities to the abstract entity "BLM Movement" and identify the relevant perspective towards this entity in probluelivesmatter as "Negative Actor".The descriptions of all identified abstract entities can be found in Table 9.

B Prompting LLMs to Generate Few Training Tweets
In this section, we describe the details of few training data generation using Large Language Models (LLMs).We conduct our initial experiments with two LLMs -GPT-3 (175B parameters) (Brown et al., 2020) and GPT-J (6B parameters) (Wang and Komatsuzaki, 2021) and find that GPT-J generates a lot of repetitive examples.We conjecture GPT-J being a very small LLM in terms of parameters is the main reason for a less diverse generation.This is in line with recent findings where larger LLMs were found to perform better in various tasks than the smaller LLMs.Hence, we retain only the GPT-3 generations in this paper.The generation process using GPT-3 is described below.

B.1 Generation
For generating a few tweets containing the stances and corresponding entity perspectives (as described in Table 2), we prompt the GPT-3 model (Brown et al., 2020) using the prompt structure shown in Figure 5.We prompt GPT-3 in such a way that all of the structured elements for a stance and corresponding perspective are present in the generated tweets.To ensure that, as shown in Figure 5, we instruct GPT-3 model to generate 30 different tweets that must fulfill the following three conditions.
(1) Explicitly mention one or more entities from the following list <entity list> and portray them as <entity perspective>.
(2) Be in the support of the <movement name> movement.
(3) Use one or more hashtags from the following list <hashtag list>.
We take the <entity list> and <hashtag list> from Table 8.We perform a separate prompting step for each (stance, abstract entity, entity perspective) tuple in Table 2.For example, to generate tweets that are "pro-BlackLM" and portray "Black Americans" as "positive target", we prompt GPT-3 by using the prompt shown in Figure 6.We find that GPT-3 rarely generates repetitive tweets.We prompt GPT-3 multiple times for generating at least 20 unique examples per (stance, abstract entity, entity perspective) tuple.One author of this paper skims through the GPT-3 generated examples to detect any inconsistency or unwanted bias in the generations and discards the generation if found any.The observation is that the GPT-3 generations are mostly clean.We use OpenAI interface2 for generating the tweets using text-davinci-003 (largest available) version of GPT-3 (till January 2023) using the following hyperparameters: top-p=1, frequency penalty: 1, presence penalty=1, temperature=1, max len=500.Note that, on the OpenAI console usage of GPT-3 for any experiment is a paid service.All of the generations used in this paper cost us in total ∼ $3 US Dollars.
Write 30 different tweets that must -1.Explicitly mention one or more entities from the following list <entity list> and portray them as <entity perspective>.2. Be in the support of the <movement name> movement.be addressed differently in different tweets.For example, "police" can be addressed as "cops" or "law enforcement".To identify the abstract entity containing spans in generated tweets, we first run SpaCy noun phrase extractor on these tweets and automatically group the extracted entities based on keyword match, for example, "police" and "police force" will be grouped together because of the common keyword "police".Then an author of this paper looks at the entity groups and discards entity groups that are not related to the target abstract entity.Then we consider the rest of the entity groups and annotate them as gold data for abstract entity mapping.The statistics of generated data are shown in Table 10.Two examples of GPT-3 generated tweets for each perspective are shown in Table 11.We submit a randomly selected subset (20 tweets) of the GPT-3 generated annotated data with this manuscript for review.Upon acceptance of this paper, we will release the whole set.

C Real Data Annotation Procedure
We human-annotate a subset of the whole #BLM corpus as described in Table 1 for the following two purposes.
• To evaluate the performance of our proposed model on real data and compare it with the baselines.
Figure 6: Prompt example for GPT-3 to generate tweets having "pro-BlackLM" stance and portraying "Black Americans" as "positive target".The black-colored text is the input text in the prompt and the blue-colored texts are generated tweets by GPT-3.
Table 9: Abstract entities and their descriptions.
• To evaluate the quality of the artificially crafted training data (using LLMs) by comparing the performance of our proposed model and the baselines that are trained with the artificially crafted data vs. the real annotated data.
The human annotation of real data occurs in the following three steps.
Step-1: From the whole #BLM corpus summarized in Table 1, we rank authors who are most consistent in the usage of the keywords blacklivesmatter and bluelivesmatter.The most consistent user uses only either of the keywords 100% of the time.Then we randomly sample 100 users from the top 500 consistent users of blacklivesmatter and bluelivesmatter resulting in 200 users.We annotate these users for their stance (pro-blacklivesmatter or pro-bluelivesmatter).We present two human evaluators with each author's id, their profile description, and all of the tweets they shared.Note that the author names were hidden and the author ids were just numeric ids assigned by Twitter to Twitter users.Then we ask the human annotators to annotate the authors for stance by looking at their profile descriptions and the tweets they shared.The human annotators are asked to annotate 'none' if it is not possible to infer the stance of an author by looking at the tweets and the profile description.We find an average inter-annotator agreement of 0.824 (almost perfect agreement) using Cohen's Kappa measure.The disagreements between the two annotators are resolved by discussion.After annotating authors for stances, we readily get stance labels for the tweets they wrote.We find that often the supporters of the movements use the keywords/hashtags that are used mostly by the counter-movement to troll or criticize the opponent.This is in line with previous findings in computational social science (Gallagher et al., 2018).We annotate these tweets as "Ambiguous Tweets".Ambiguous tweets are intuitively more difficult for models to disambiguate because they use the keywords and hashtags that are fre- quently used by the opponents.We determine a tweet to be an ambiguous tweet if it is annotated by human annotators as "pro-blacklivesmatter" but the tweet uses the keyword "bluelivesmatter" and vice versa for "pro-bluelivesmatter".Examples of ambiguous tweets are presented in Table 12.
Step-2: After annotating tweets and authors for stances we extract entities from these tweets using SpaCy noun phrase extractor and in this step, we annotate these entities for abstract entity labels.We present two human annotators with a tweet, its stance (annotated in the previous step), and an entity mentioned in the tweet.Then we ask the annotators to assign an abstract entity label to the entity from the list of abstract entities in Table 2.The annotators could select multiple abstract entity labels or "none" for an entity.We find an average Cohen's Kappa inter-annotator agreement score of 0.697 (substantial agreement) for this task.The disagreements are resolved by discussion.We find that 78% of the entities map to at least one abstract entity and a small portion of entities (<5%) are identified to map to multiple abstract entities.It implies that the abstract entities summarized in Table 2 are the main actors related to the movements and they have very good coverage of the whole data.Based on the feedback from the human annotators, the unmapped entities are mostly cases where it is difficult to determine what abstract entity they are referring to without more context and a small portion of them were wrong entity detection by the SpaCy noun phrase extractor.
Step-3: In this step, we annotate the entities annotated for abstract entity labels in Step-2, for sentiments toward them (positive/negative) and assigned roles (actor/target).We present two human annotators with the tweet text, its stance, an entity mentioned in the tweet, and the abstract entity label of the entity (all determined in the previous two steps).Then we ask them to annotate the entity for role and sentiment.For role identification, the annotators are instructed to select "none" if it is not clear from the tweet text what the role of the entity is or select "both" if the entity is portrayed both as an actor and a target.For example, in the tweet "Police keep us safe.We should defend our Police.",the entity "Police" is portrayed to be both a positive actor and a positive target.We find Cohen's Kappa inter-annotator agreement scores of 0.976 and 0.815 for entity sentiment and entity role annotation, respectively, that are almost perfect.We resolve the disagreements by discussion.In < 1% cases, we find an entity to have both the actor and target roles in a tweet.
The per-label agreement scores for each annotation task can be found in Table 13 and the final annotated data statistics can be found in Tables 14  and 15.
Both of the human annotators were graduate students (age above 21) and they were awarded research credits for this annotation task.They were sufficiently briefed on the tasks and were informed Twarc API calls 3 .Before using the #BLM corpus collected from (Giorgi et al., 2022), we remove all non-ASCII characters and URLs from the tweet text.We used SpaCy Noun Phrase Extractor to extract the entities from tweet text.
To extract keywords from the author profile descriptions, first, we identify all hashtags used in the description and add them to the keywords list, then we extract ngrams (1 ≤ n ≤ 3) from the residual text.Then we merge each ngram to a single word and check if it is similar to a hashtag (e.g., merged ngram "Black lives matter" is similar to "#blacklivesmatter").If it is similar we add that ngram to the keyword list.For the residual ngrams we look at the most occurring ones and manually discard those that do not imply any meaningful message.We add the rest of the ngrams to the keywords list.
Each element summarized in Table 1, corresponds to a unique node in our graph.For example, if two tweets mention lexically equal entities, the entities will have two different node representations in the graph.Because the perspectives towards the lexically equal entities may be different in the two tweets.For example, "law enforcement" may be portrayed as "positive actor" in one tweet and "neg-3 https://twarc-project.readthedocs.io/en/latest/api/client/ ative actor" in another.Hence, the number of nodes and edges in our graph corresponds to the statistics in Table 1.

D.2 Task Adaptive Pretraining of RoBERTa
Following previous works (Gururangan et al., 2020), we perform task adaptive pretraining of RoBERTa for our task.We continue pretraining RoBERTa with the whole word masking technique.In this approach, we randomly select some words and mask the whole word.Then we predict the original vocabulary ID of the masked word based on the context it appears in.We use our unused data of the #BLM corpus from (Giorgi et al., 2022) in this pretraining step.We find that this task adaptive pretraining improves classification results significantly over simple RoBERTa as shown in Tables 4  and 16.

D.3 Our Model Initialization and Hyperparameters
We initialize the representation of each node type in our graph using this RoBERTa-tapt.Author nodes are initialized by RoBERTa-tapt embeddings of the author profile descriptions or average embedding of randomly sampled 5 tweets by the author if no profile description is found.We pretrain the external classifiers (C sent , C role ) on out-of-domain data proposed by Roy et al. (2021).In this dataset, entities mentioned in tweets from US politicians are annotated for their moral roles.The moral roles are associated with positive and negative sentiments and actor and target types.As a result, we get annotated dataset for pretraining our external classifiers, C sent and C role , for sentiment and role classifications, respectively.To train these classifiers, contextualized embeddings of the entities are obtained using RoBERTatapt, and a fully-connected layer is used to identify role/sentiment in these two classifiers.We train these classifiers until the accuracy in a held-out validation set (20% from OOD) do not improve for three consecutive epochs.The accuracies of both of these classifiers on the out-of-domain validation sets were > 92%.We stop backpropagating to RoBERTa-tapt when C sent and C role are combined with our framework, however, the fully-connected layers are updated.
We use a 2-layer R-GCN to encode our graph nodes and 768d input node features are learned in 100d and 50d spaces in the 2 layers of the R-GCN.We use a learning rate of 0.0005 to train the model.We set the tweet threshold, t for author consistency check to 10, 5, and 3 at epochs 1, 20, and 50, respectively.We find that the majority number of tweets (> 75%) become consistent and are added to the self-learned training set till epoch 300.The hyperparameters are determined empirically.

D.4 Baselines
Naive: We implement two naive baselines.The first one is a random selection where labels of tweet stance, entity sentiment, entity role, and entity mapping are determined by a random selection of corresponding labels.We perform the random selection using five different random seeds and report the average results.The second naive baseline is applicable for only tweet stance classification.We follow a keyword-matching approach similar to Giorgi et al., 2021a for that.For the classification of tweet stance using keyword-matching, we use the keywords available with each data point in (Giorgi et al., 2021a).In this dataset, each tweet is marked with one or more of the following three keywords [blacklivesmatter, bluelivesmatter, alllivesmatter] based on their presence in the tweet.We classify a tweet to be pro-BlackLM if it is marked to have the keyword blacklivesmatter in it and label it as pro-BlueLM if it contains bluelivesmatter.In case of a tie or not availability of any one of these keywords, we break it randomly.We do not consider the keyword alllivesmatter in this classification approach because this keyword is more ambiguous and used widely by both of the movements.We determine the author stances based on the majority voting on the identified stances of the tweets they wrote.

Discrete Text Classifiers:
We implement the second type of baseline that are discrete text classifiers based on pre-trained RoBERTa.We finetune separate RoBERTa-based classifiers for each of the tweet stance, entity sentiment, entity role, and entity mapping classification tasks.
For the classification of tweet stances, we encode the tweet text using RoBERTa-tapt.The representation of the [CLS] token of the last hidden layer is used as the tweet embedding and it is passed through a fully connected layer to predict the stance of the tweet.We update RoBERTa-tapt parameters as well during the learning steps.We stop learning when the validation accuracy does not improve for three consecutive epochs.The author stances are determined based on the majority voting on the identified stances of the tweets they wrote.
For the classification of entity roles, sentiments, and mapping we encode the entity-mentioning segments in the tweet texts using RoBERTa-tapt.We take the representation of the last layer and select tokens corresponding to the entity spans in the tweet text.Then we average these selected tokens' embeddings to get the representation of the entity.Then a fully connected layer is used to predict either of sentiment, role, or abstract entity mapping of the target entity.We update RoBERTa-tapt parameters as well during the learning steps.We stop learning when the validation accuracy does not improve for three consecutive epochs.To make the entity role and sentiment classification baselines comparable to our proposed model, during training, we combine the OOD data from (Roy et al., 2021) with the LLM-generated training data or the human-annotated training data in the weak and direct supervision settings, respectively.Multitask: Our model jointly models perspectives with respect to entities (sentiment towards them, assigned role, abstract entity mapping) and the stances in the tweets.Context-rich representations of entities and tweets are learned and the single unified representation is used to infer various labels such as stance for tweets and, sentiment, role, and mapping for entities.As a result, the closest match to our model is the multitask approach.
To implement the multitask baseline, we define a single pre-trained RoBERTa-tapt text encoder that is shared across all classification tasks such as tweet stance, entity sentiment, entity role, and entity mapping.
For the classification of tweet stances, we encode the tweet text using the shared RoBERTa-tapt encoder.The representation of the [CLS] token of the last hidden layer is used as the tweet embedding and it is passed through a task-specific two-hiddenlayer feed-forward neural network to predict the stance of the tweet.
For the classification of entity roles, sentiments, and mapping, we encode the entity-mentioning segments in the tweet texts using the shared RoBERTatapt encoder.We take the representation of the last layer and select tokens corresponding to the entity spans in the tweet text.Then we average these selected tokens' embeddings to get the representation of the entity.Then three different task-specific two-hidden-layer feed-forward neural networks are used to predict the sentiment, role, and abstract entity mapping of the target entity, respectively.
We define the multitask loss function, L M as follows.
Here, L stance is tweet stance classification loss, and L sent , L role and L map are entity sentiment, role, and mapping classification losses, respectively.All of the losses are Cross Entropy losses and we set, We pretrain the shared RoBERTa-tapt and the sentiment and role classification-specific neural networks with the OOD data from (Roy et al., 2021) in the same way we pretrain the C sent and C role classifiers that are used in our model.All the task-specific feed-forward neural networks use two hidden layers consisting of 300 and 100 neurons with ReLU activations.We keep training the shared RoBERTa-tapt and the task-specific clas-sifiers using either the artificial training data or human-annotated real training data.We stop training when the combined tweet stance, entity sentiment, role, and mapping classification accuracy do not improve for five consecutive epochs.
Multitak with Author Information: The multitask approach described above jointly models only textual features.However, our proposed model (as described in Section 3), is able to incorporate author information and social interaction among them.Hence, for a fair comparison of the multitask baseline with our proposed model, we enhance the multitask baseline with author information.
First of all, we learn rich author embeddings using relational graph convolutional networks (R-GCN).For that, we create a graph consisting of only author nodes and keyword nodes (as described in Section 3).Two authors are connected to each other with the retweet relationships and an author is connected to a keyword node if they use the keyword in their profile description (as described in Section 3).Then we learn the author embeddings in this graph using a two-layer R-GCN with a link prediction objective.In this learning objective, the similarity between two author nodes is maximized if they are connected in the graph and the similarity is minimized if they are not connected in the graph.The link prediction loss function, L link is defined as follows.
Here, sim(a t , a p ) is the similarity between author embeddings a t and a p , where author, t is connected to author, p in the graph.sim(a t , a n ) is the similarity between author embeddings a t and a n , where author, t is not connected to author, n in the graph.We measure and define similarity as the dot product between author embeddings.We randomly sample 5 negative examples (two authors are not connected) for each positive example (two authors are connected) in each layer of R-GCN.
We train the R-GCN layers by optimizing the loss L link for 10 epochs.Author nodes are initialized using the RoBERTa-tapt encodings of the author profile descriptions or average embedding of randomly sampled 5 tweets by the author if no profile description is found.The keyword nodes are initialized with their RoBERTa-tapt encodings.All author embeddings are learned in a 50 dimensional space using the 2-layer R-GCN.In this manner, we obtain rich 50 dimensional author embeddings that encode the author retweet relationships and their profile descriptions.
We enhance the multitask baseline described above with these social network-enhanced author embeddings.Note that in the case of the weak supervision setup, the tweets are generated by LLMs and there is actually no author for the tweets.As a result, in the case of the weak supervision setup, we simply average the embeddings of all augmented tweets related to a perspective to get an imaginary author representation for those perspectives.
We concatenate the shared RoBERTa-tapt encodings of the tweets and entities with the corresponding learned author embeddings and use these concatenated representations as inputs for the taskspecific feed-forward neural networks in the multitask model.The rest of the multitask learning objectives and the hyperparameters remain the same as described above.In this manner, the multitask approach with author embeddings becomes comparable to our proposed model as they incorporate the same information (textual and author network) in the learning process.

D.5 Variations of Our Model
We study our model in the following four variations as shown in Tables 4 and 16.
(1) Text-Discrete: In this version, only textual elements such as the tweets and the entities, are considered.The interactions among them are not modeled.For example, the nodes corresponding tweets and entities are not connected using edges.Just initialized node representations are used for classification.This version is comparable to frozen RoBERTa.
(2) Text as graph: In this variation, the text is converted to a graph consisting of tweet, entity, and hashtag nodes.The relations among these elements are modeled using edges.Note that no author information is added.Tweet stances are inferred only conditioning on learned tweet node embeddings.No self-inference is done in this variation.In this variation, we train our model for at least 15 epochs and after 15 epochs we stop training if the total loss does not increase for three consecutive epochs.
(3) Text as graph + Author Network: In this variation the author network is added to the textonly graph, however, no self-learning is done.In this setting, we train our model for at least 15 epochs and after 15 epochs we stop training if the total loss does not increase for three consecutive epochs.
(4) Text as graph + Author Network + Self-Learning: This is the final version of our model where self-learning is added with the combined text and social graph.In this version, we follow the stopping criteria as described in Section D. 3.
We run all models (including baselines) 5 times using 5 random seeds.The average weighted F1 scores for the classification tasks for all models with the corresponding standard deviations can be found in Table 16 and the average macro F1 scores are reported in Table 4.

D.6 Model Implementation Libraries
We use the DGL Graph Library (https://www.dgl.ai/) and PyTorch to implement all of the models.We use AdamW optimizer for all of the models for optimizing parameters.

D.7 Infrastructure
We run all of the experiments on a 4 core Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz machine with 64GB RAM and two NVIDIA GeForce GTX 1080 Ti 11GB GDDR5X GPUs.GPT-J-6B was mounted using two GPUs.To run our graph-based model for 300 epochs it takes around 8 hours in this infrastructure.
We submit our model implementation scripts with this manuscript.All of our pretrained models and scripts for implementation will be made publicly available upon acceptance of this paper.A randomly generated subset of the LLM-generated training set and human-annotated test set is submitted with this manuscript for review.We will also publish these datasets by maintaining proper Twitter privacy protocols upon acceptance.

D.8 Ablation
To determine how many training examples are needed to learn our proposed self-learning approach, we ran an ablation study.We randomly sample 33.33%, 66.67%, and 100% of the training data from the LLM-generated training set (weak supervision) and human-annotated training set (direct supervision) (as summarized in Table 3) and train our proposed model and the multitask baseline (with author embeddings) using the sampled set.We run this sampling and train the models using five random seeds (1000,2000,3000,4000,5000) and create learning curves using the average over all runs.The learning curves for combined perspective and tweet stance detection are shown in Figure 7.It can be observed that in both weak and direct supervision setups, our model is less sensitive to the number of training data compared to the multitask baseline in both of the tasks.It proves that our proposed model can achieve good performance with a little amount of annotated or artificially generated data.

E Qualitative Evaluation Details
We infer perspectives using our model and the multitask baseline on the whole #BLM corpus (in Table 1).In the weak supervision setup, we use all the tweets generated by GPT-3 as training data, and in the direct supervision setup, we use all the humanannotated data for training (combining the humanannotated test and train sets in Table 3).We use a random seed of 1000 for initializing all models and the same hyperparameters as described in Sections D. 3 and D.4.The following qualitative evaluations are done using the inferred labels.

E.1 Correlation of author behavior with stance
To do the analysis on the correlation of authors' following behavior with their stance on #BLM, we first compile a list of tweets accounts of US politi-cians from the publicly available congressional tweets corpus available on https://github.com/alexlitel/congresstweets.Then we collect the follow relationships between these US politicians and the authors in our dataset.Then we measure the Point-biserial correlation coefficient (Tate, 1954) between the percentage of time a user follows a politician from a specific political party (Republican/Democratic) and the identified stance of the author by the models.We find all correlations with p < 0.0001.1, 528 authors are found to follow at least one politician in our corpus.
To analyze the correlation of authors' sharing behavior with their stance, we collect the list of left and right news media domains from https: //mediabiasfactcheck.com/.Then we measure the Point-biserial correlation coefficient between the percentage of time a user shares an article from a media outlet with a specific bias (left/right) and the identified stance of the author by the models.We find all correlations with p < 0.0001.2, 217 authors are found to share at least one news article from the news sources we gathered.Note that, during preprocessing, all URLs are removed from the tweet texts.Hence, the media outlet domain information is not encoded in the input embeddings in our model.
The numeric values in the correlations bar plot in Figure 3 can be found in Table 17.

E.2 Entity mapping analysis
First, we manually detect groups of literal entities that are the same.For example."lives", "their lives" are merged into the literal entity "lives".Then we detect the high PMI perspectives associated with each of these literal entities in each of the camps using the equation described in Appendix A. The high PMI perspectives are reported in Table 6.Some example tweets corresponding to the perspectives in Table 6 and the literal entity groups are shown in Tables 20 and 21, respectively.We find that our model is able to capture the pattern when the same phrases are used by the different movements to address different entities and also the mapping and sentiments towards some common named-entities (e.g., "Derek Chauvin") better than the multitask baseline as shown in Table 18.

E.3 Discourse of the movements
Table 19 shows the discourse of the pro-BlackLM and pro-BlueLM camps using perspectives and moral foundations.To analyze the discourse of the movements we identify the high PMI perspectives associated with each campaign using the same formula as described in Appendix A.
For the identification of moral foundations in the tweets, we train a RoBERTa-based classifier on out-of-domain data.We take the Twitter moral foundation dataset proposed by Johnson and Goldwasser, 2018.In this dataset, 2k tweets are annotated for one or more moral foundations.We encode the tweets using our pre-trained RoBERTatapt ([CLS] token of the last hidden layer) and use a fully connected layer to identify the moral foundation labels.This is a multi-label classification task.Hence, we use BCE loss and AdamW optimizer for optimization.We stop training the model if the validation accuracy does not improve for three consecutive epochs.The moral foundation classification F1 score on a held-out validation set was 75.19%.We infer the moral foundation labels in the #BLM corpus using this classifier.

E.4 Temporal analysis
We calculate the percentage of tweets where a specific perspective is used on a given day.For one type of perspective such as negative actors we calculate the percentage taking tweets only mention-ing negative actors in a particular stance.The temporal trends are shown in Figure 8.

F Literature Review
Understanding the discourse of the #BLM movement: The discourse of the #BLM movement is understudied in NLP.One of the early works (Keith et al., 2017) proposed a distantly supervised EMbased approach for Identifying the names of civilians in news articles killed by police.Later Nguyen and Nguyen, 2018 incorporated Deep Learning methods in this task.Both of the works study only named entities and lack the study of how entities are addressed using other words (e.g., "Thugs", "Heroes") or simply using Pronouns.Recently, Ziems and Yang, 2021 introduced the Police Violence Frame Corpus of 82k news articles covering 7k police killings and they studied entitycentric framing where the frame towards the victim of police violence is defined as the age, gender, race, criminality, mental illness, and attacking/fleeing/unarmed status of the victim.All of these works are done on more formal texts such as news articles and do not address how different entities are addressed and what the sentiment and perspectives expressed towards them are.
Recently, a shared task of identifying BLM- Figure 8: Temporal trends in all camps for different types of entity roles identified using our model in the direct supervision setup.Note that, mostly BLM is portrayed as positive actors in pro-BlackLM and the portrayal of other entities as positive actors is significantly lower in this camp.Hence, the trend for positive actors in pro-BlackLM is not shown in this analysis.

Derek Chauvin derek chauvin, chauvin
The police officer who killed George Floyd.
Thugs thugs, these thugs, the thugs A negative term to address a person or a group of persons.
David Dorn # daviddorn, david dorn Black Police officer who was killed during the George Floyd protests.
Lives their lives, lives, the lives, our lives, life, my life, his life, her life, their life, the life, a life Self-explanatory.

They they
The Pronoun they.
Table 21: Description of the literal entities studied in the qualitative evaluations.Peng et al., 2019 discovered that people who join the movement in response to a real-life event are more committed to it compared to the other types.
Another line of research studied the power of social media in mobilizing the movement (Mundt et al., 2018;Freelon et al., 2018;Grill, 2021) One major drawback of these studies is that many of them often involve expensive human studies.
There have been several attempts to understand and analyze the narrative in online text in response to the murders of Black individuals in social science.For example, Eichstaedt et al., 2021 andField et al., 2022 studied the emotions expressed in the online text in response to murders of Black persons.Another line of research studied the framing (Stewart et al., 2017;Ince et al., 2017), rhetoric functions (Wilkins et al., 2019), participation and attention to topics (Twyman et al., 2017), and narrative agency (Yang, 2016) related to the movement.Some other works studied hashtag-based analysis of the #BlackLivesMatter movement (Blevins et al., 2019) and the divergence of the countermovements (e.g., the #AllLivesMatter movement) (Gallagher et al., 2018).
The existing studies in computational social science on narrative understanding mostly rely on hashtag or lexicon-based analysis of the movements, hence, they fail to capture the nuances in the messaging choices and often cannot differentiate between movements and counter-movements when representative hashtag from one movement is "hijacked" (Gallagher et al., 2018) by the supporters of the counter-movement.
In contrast to these existing studies in CSS and NLP on #BLM, in this paper, we propose a holistic technical framework for characterizing such social movements on online media by explicitly modeling the perspectives of different camps.
Perspective Analysis and Stance Detection: Revealing perspectives in complex and deceptive discussions is an important part of discourse analysis and it has been studied in different settings and variations in recent studies.For example, Thomas et al., 2006 andBurfoot et al., 2011 attempted to identify stances in congressional floor-debate transcripts (against or in support of proposed legislation).Another line of research studied stances in online debate forums where the stance (pro vs. con) of a speaker on a specific issue is predicted (Somasundaran and Wiebe, 2010;Walker et al., 2012;Hasan andNg, 2013, 2014;Sridhar et al., 2015;Sun et al., 2018;Xu et al., 2018;Li et al., 2018).Mohammad et al., 2016 introduced a shared task of predicting stances in microblogs such as tweets where the task is to identify the stance in a tweet with respect to a given target (e.g.entity, issue, and so on).Consequently, more recent works have studied political stances in politically controversial is-sues (Johnson and Goldwasser, 2016b,a;Ebrahimi et al., 2016;Augenstein et al., 2016).Recently, another shared task has been proposed (Pomerleau and Rao, 2017) where the task of fake-news detection is studied from a stance detection perspective.In this task, a headline and a body text are giveneither from the same news article or from two different articles.The task is to determine if the stance of the body text relative to the claim made in the headline is in agreement, disagreement, discussion, or irrelevant.
Identification of stance is studied using various approaches (Cignarella et al., 2020).For example, Burfoot et al., 2011 andSridhar et al. (2015) studied stance detection using a collective classification approach.Sridhar et al., 2015 andHasan andNg, 2014 studied not only stances but also the reasoning behind stances and the disagreements among them.In a related work in this line, Somasundaran and Wiebe, 2010 used lexicon-based features for detecting "arguing" opinions, and supervised systems using sentiment and arguing opinions were developed for stance classification.Sun et al., 2018 proposed a neural model to learn mutual attention between the document and other linguistic factors.Thomas et al., 2006 leveraged inter-document relationships andWalker et al., 2012 leveraged the dialogic structure of the debates in terms of agreement relations between speakers for stance detection.Another line of research is built on relational learning-based methods for stance detection where various types of contextualizing and relational properties are explicitly modeled and incorporated in stance detection.For example, Johnson and Goldwasser, 2016b proposed a relational learning framework incorporating framing and temporal activity patterns, and Ebrahimi et al., 2016 proposed a relational model incorporating the friendship networks of the authors.The main drawback of such relational learning approaches is that all possible solutions to the problem are explored and the one with the best gain is returned, as a result, the inference tree becomes really large and computationally expensive for a large amount of data.
In this paper, we study perspectives and stances related to the #BLM movement that is understudied in literature.
Data Augmentation Approaches in NLP: Data augmentation refers to the technique where additional and ideally diverse data are generated without explicitly collecting new data.Data augmen-tation techniques can be useful in a low-resource setting where obtaining dataset for training models are difficult.Annotated data for nuanced perspectives are difficult to get as annotating them pertains to specialized knowledge.Hence, automatic data augmentation is desirable.
With the recent advances in deep learning-based models, different data augmentation techniques have been proposed.For example, Kobayashi, 2018proposed RNN-based, Kumar et al., 2019 trained sequence to sequence models, and Yang et al., 2020 andNg et al., 2020 proposed pretrained transformer-based approaches for data augmentation.In most of these approaches, existing text data is modified to augment new data.In contrast to these studies, Anaby-Tavor et al., 2020 andQuteineh et al., 2020 directly estimated the text generation process using GPT-2 (Radford et al., 2019) rather than modifying an existing example to generate new data.
With the recent advances of transformer-based pretrained auto-regressive Large Language Models (LLMs) such as GPT-3 (Brown et al., 2020), a new direction for data augmentation has opened up where controlled text generation (Iyyer et al., 2018;Kumar et al., 2020;Liu et al., 2020) is feasible by prompting these pretrained LLMs.These LLMs are trained on a huge corpus of web crawls, hence, have the capability to generate human-like text.One recent work by Liu et al., 2022 combined the generative power of these LLMs and the evaluation power of humans and proposed a human-LLM interaction loop to generate datasets for Natural Language Inference (NLI) tasks.Inspired by these recent advances in data augmentation using LLMs, in this paper, we propose to augment few-shot training data using LLMs by prompting them in a structured way.This augmented dataset is later used to bootstrap our proposed model.

Figure 1 :
Figure 1: Author stances help disambiguate entity references and understanding perspectives.

Figure 3 :
Figure 3: Correlation between stances on #BLM movement and authors' following and sharing behaviors.
Trend in portrayal of entities as neg.actors in pro-BlueLM.

Figure 4 :
Figure 4: Temporal trends identified using our model and direct supervision.

3 .Figure 5 :
Figure5: Prompt structure for GPT-3 to generate tweets containing specific stances and perspectives.The blackcolored text is the input text in the prompt and the blue-colored texts are generated tweets by

Figure 7 :
Figure 7: Learning curves for perspective and tweet stance detection.
Portrayal of entities as positive targets in pro-BlackLM over time.Portrayal of entities as negative actors in pro-BlackLM over time.
by two arXiv:2310.07155v1[cs.CL] 11 Oct 2023 ). GPT-J generated mostly repetitive examples, hence, we use the GPT-3 generations.Detailed prompting methods, generated data statistics and examples, LLM hyperparameters, and pre/post-processing steps are in Appx.B. Human Annotation of Real Data: To evaluate the quality of the artificially crafted data and to evaluate our model's performance in real data, we

Table 3 :
Training and test data statistics.
+ Author Embed.81.81 ± 1.4 90.48 ± 2.0 76.03 ± 2.2 85.08 ± 1.0 31.58 ± 0.7 71.76 ± 4.8 85.17 ± 1.1 86.79 ± 0.4 83.03 ± 0.8 87.17 ± 0.2 49.93 ± 2.2 46.42 ± 3.5 ). Detailed human annotation process, inter-annotator agreement scores, examples of ambiguous tweets, and per-class data statistics are in Appx.C. Train-Test Split: We randomly sample a small subset of the human-annotated authors and use these authors, their tweets, and mentioned entities as training set and the rest of the data as a test set for our proposed model and all baselines.We perform our experiments in two setups -(1) Weak supervision: LLM-generated examples

Table 6 :
Examples of literal entity to abstract entity map.Sometimes pro-BlackLM and pro-BlueLM use the same phrase to address different entities and/or perspectives.DISCOURSE IN PRO-#BLACKLIVESMATTER (inferred using our model and direct sup.) DISCOURSE IN PRO-#BLUELIVESMATTER (inferred using our model and direct sup.)police pos.actor auth./subv., loyal./betray.police pos.target, antifa neg.actor Protect the officers!They are only following orders and keeping America safe.

Table 7 :
Discourse of movements explained with messaging choices and Moral Foundations (MFs).Moral Foundation care/harm was used in all of the cases by both sides.Hence, it is removed from the table.
StevenY Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, and Eduard Hovy.2021.A survey of data augmentation approaches for nlp.In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 968-988.Pranav Anand, Rob Abbott, and Ricky Grant.2012.Stance classification using dialogic properties of persuasion.In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 592-596, Montréal, Canada.Association for Computational Linguistics.
Discovering black lives matter events in the United States: Shared task 3, CASE 2021.In Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), pages 218-227, Online.Association for Computational Linguistics.Kazi Saidul Hasan and Vincent Ng. 2013.Stance classification of ideological debates: Data, models, features, and constraints.In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1348-1356, Nagoya, Japan.Asian Federation of Natural Language Processing.Kazi Saidul Hasan and Vincent Ng. 2014.Why are you taking this stance?identifying and classifying reasons in ideological debates.In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 751-762.Kristen Johnson and Dan Goldwasser.2016a.Identifying stance by analyzing political discourse on twitter.In Proceedings of the First Workshop on NLP and Computational Social Science, pages 66-75.

Table 10 :
generated training data statistics.

Table 14 :
Annotated data statistics for entities.

Table 15 :
Annotated data statistics for authors and tweets.

Table 16 :
Average weighted F1 scores with standard deviations for classification tasks over 5 runs using 5 random seeds: 1000, 2000, 3000, 4000, 5000.We infer after every 10 steps and before the first inference step, we train the model until the total training loss does not increase for 3 consecutive epochs.We stop training the model if the number of new training examples is less than 0.3% for 10 consecutive inference steps or a maximum epoch of 300 is reached.For consistency check, we use a label confidence threshold of 0.9 till epoch 200 and reduce it to 0.8 after that, as many training examples are already added till then and the model is intuitively more stable.

Table 17 :
Point-biserial correlation coefficient values between author stance and their following and sharing behaviors on Twitter.All the correlations had a p value < 0.0001.

Table 18 :
Examples of literal entity to abstract entity mapping.Sometimes pro-BlackLM and pro-BlueLM use the same phrase to address different entities and/or perspectives.The lexicon of the literal entities in this table can be found in Table21.

Table 19 :
Discourse of movements explained with messaging choices and Moral Foundations (MFs).Moral Foundation care/harm was used in all of the cases by both sides.Hence, it is removed from the table.This table is created using the predictions by our model in the direct supervision setup.

Table 20 :
Examples of tweets where the pro-BlackLM and pro-BlueLM use the same phrases to address different entities and/or perspectives.The literal entities in the example tweets are bolded and underlined.The lexicon of the literal entities in this table can be found in Table21."N/A" means the entity is not mentioned frequently in a particular campaign.
. Some works have studied what properties accelerate mobilization.For example, Casas and Williams, 2019 studied the power of images in mobilizing the movement and Keib et al., 2018 studied what types of tweets are retweeted more related to the movement.De Choudhury et al., 2016 studied temporal analysis of the #BLM movement based on social media, engagement of people from different demographics, and their correlation with textual features.