A Case Study of Analysis of Construals in Language on Social Media Surrounding a Crisis Event

The events that took place at the Unite the Right rally held in Charlottesville, Virginia on August 11-12, 2017 caused intense reaction on social media from users across the political spectrum. We present a novel application of psycholinguistics - specifically, construal level theory - to analyze the language on social media around this event of social import through topic models. We find that including psycholinguistic measures of concreteness as covariates in topic models can lead to informed analysis of the language surrounding an event of political import.


Introduction
Construal Level theory (CLT) (Trope and Liberman, 2010) postulates that people create differing mental representations of the same information depending upon whether the information is psychologically proximal or psychologically distant. For instance, people experience geographically distant, and hence psychologically distal events, by forming mental construals of such events at higher levels of abstraction than events that are geographically proximal (Fujita et al., 2006). These construals manifest themselves in the language people use, specifically in concreteness values. Additionally, empirical research has demonstrated that the tendency to create abstract versus concrete construals systematically affects human judgments, attitudes, and behaviors (McCrea et al., 2012).
To illustrate, consider the example of climate change. Research has shown that when people are primed to think about the topic of climate change using more concrete terms such as beetle and forest vs. more abstract terms (sea levels), they are more likely to engage with the topic of climate change (Scannell and Gifford, 2013). Concreteness of words is the degree to which a concept denoted by the word refers to a perceptible entity.
High Abstraction/ Low Concreteness A Confederate who was opposed to secession, but refused to fight against Virginia https://t.co/UTJvNsEYd7 #waxmuseum #USHistory Low Abstraction/ High Concreteness "Confederate general/soldiers statues / memorials are literally just participation trophies " -the best sentence I ever heard #Charlottesville In other words, it is easier to generate a mental image of a beetle as opposed to a mental image of sea level, and talking about the topic of climate change in more concrete terms makes people more likely to engage with the topic. Furthermore, the analysis of words and their associated sentiments can be used to conclude the tone of discussion and how the discussion around climate change can vary between countries (Dahal et al., 2019).
Construals can differ based on geographical, social and temporal distance. An event which is distant in the future would be described in language that has higher levels of abstractness (and therefore low concreteness) than an event which is more proximal. Given that (a) language use reflects differing levels of construals and (b) construals can differ for events that are temporally distant vs. temporally proximal, we seek to investigate whether individuals on social media would discuss an event using different levels of construals and whether we can determine the effects of these construals from their language use.
We thus use Construal Level Theory as a theoretical foundation to understand the reaction of individuals on Twitter related to the Unite the Right rally that took place in Charlottesville, Virginia on August 11-12, 2017. We apply topic models to analyze language use and study how users view the events that took place during the protests. To demonstrate, consider the tweets shown in Table 1 as examples of high concreteness/low abstraction vs. low concreteness/high abstraction language surrounding the Charlottesville Rally from our corpus. While one tweet discusses the topic using highly concrete words (statues and trophies), the other does so using abstract concepts like secession and confederate.
Our work, situated at the intersection of psycholinguistics and computational social science, makes the following salient contributions: • We extend the application of Construal Level Theory beyond laboratory settings to make it more ecologically valid; • To analyze language produced spontaneously on social media, we use topic modeling and include concreteness values as covariates in the topic models.

Related Work
Construal Level Theory to Study Human Behavior: Construal level theory, first introduced by Liberman et al. (2007), describes the relation between psychological distance and how the mind perceives objects and events as abstract or concrete. The distance consists of temporal, spatial, and geographical dimensions. McCrea et al. (2008) explained how representing tasks that must be completed in a concrete way decreases the likelihood of procrastination. The theory has also been applied by Stephan et al. (2011) to show that temporal proximity and concrete construals produce a corresponding increase in perceived social closeness (described as familiarity with a specific topic). Williams et al.
(2014) conducted a study regarding how psychological distance of thought would impact the positivity of reactions. They showed how distance from a scenario (having it happen to oneself versus to someone else) impacts one's reaction to it. Snefjella and Kuperman (2015) show that abstraction increases with distance and decreases as spatial distance decreases. (Rufai and Bunce, 2020) analyze tweets from top world leaders' responses to the COVID-19 pandemic with results unrelated to construal theory, yet still integrate the categorization of tweets from each leader into categories that can further explain the path of response each country's leader took. However, most of the work cited above is based on laboratory studies. On the other hand, social media language has the benefit of being more ecologically valid, in that, communication between speakers is more interactive and messages are generally spontaneous rather than prompted or composed before delivery.
Topic Models to Study Language Data: Topic modeling techniques, based on probabilistic latent semantic analysis (Hofmann, 2001), latent Dirichlet allocation (LDA) (Blei and Lafferty, 2006) have been widely used to support quantitative and qualitative analysis of text data. While the topics are uncorrelated in the base LDA model, correlated topic models leverage the fact that certain topics may share words between them and thus be closer to one another (Blei et al., 2007). Topic models can be created using a variety of methods, and salient topics can be derived from tweets collected using both traditional LDA and non-traditional methods (Demszky et al., 2019). Topic models have also been used to study topics that analyze how human emotion is attached to text samples in context different than construal theory analysis (Kleinberg et al., 2020). Structured topic models (STM) (Wallach, 2008), treat the documents as sequences of segments, which can share the same prior distribution of topics. This allows the model to leverage the existing structure of documents from the given segmentation. The other advantage of using STM is that it allows for the inclusion of covariates into the prior distributions, so that variance of different topics of the variable of interest can be investigated (Roberts et al., 2014). While covariates such as political ideology have been widely studied in prior literature (Bauer et al., 2017), the inclusion of psycholinguistic measures of words has not heretofore been systematically studied. We thus investigate whether the inclusion of psycholinguistics measures of concreteness in the topic models results in meaningful comparisons of the underlying construals about the events.

Data
A major challenge while studying social media data is representativeness and sample selection bias (Tufekci, 2014). To address this challenge, we designed an observational study using Twitter's public APIs to obtain a longitudinal dataset of tweets from Feb 7, 2017 through Oct 11, 2017 around the Charlottesville protests of August 2017, in Virgina, USA. As an event of far-reaching social and political import, which was characterized by not only the Figure 1: List of hashtags and keywords used to collect our data corpus for Charlottesville protest event. The hashtags were split into two Conditions. In Condition 1, there are two sets of keywords and hashtags and the search criteria is that the tweet should match at least one item from each set. Condition 2 is a set of hashtags, where the search criteria is to match at least one item from the set. discussion surrounding planning of protests, but the ensuing discussion after August due to the death of Heather Heyer, this event serves as an exemplary case for analysis of how individuals formed construals before, during and after the event. We used a carefully curated set of keywords, and defined the search criteria iteratively: first, we conducted an advanced search on Twitter for tweets containing keywords from trending tweets, including hashtags regarding the Charlottesville event. Next, we examined the tweets resulting from this search to identify additional key words we had missed, and then we conducted additional data pulls to include tweets with these additional keywords. All research was conducted in accordance with the university ethics board approval. Data collection was ruled exempt because we collected tweets from public accounts. We acquired the data through the GNIP Historical Powertrack Twitter API for the Charlottesville event by using the data pullsearch string in Figure 1 resulting in 526, 102 tweets.

Method
We use R and the STM (Roberts et al., 2019) package to build our topic models. We preprocess the data by converting all tokens to lowercase, removing symbols from the text, and removing stopwords using the spaCy library (Honnibal et al., 2020) in Python. We also include some custom stopwords such as like and try to make the topics more meaningful. We used smenatic coherence as one of the measures to determine final number of topics.
We then used an existing concreteness lexi-con (Brysbaert et al., 2014a) to compute the average concreteness value of words that occur in tweets. The concreteness lexicon by Brysbaert et al. (2014a) contains concreteness values of over 40,000 English words in their lemma form and has been used in prior natural language research to investigate argument strategies (Tan et al., 2016) and for predicting text comprehension (Crossley et al., 2017), among others. However, prior approaches that investigate psychological distance in natural language (Bhatia and Walasek, 2016;Snefjella and Kuperman, 2015) compute average concreteness scores for each tweet by consulting the concreteness lexicon for all words that occur in tweets. By contrast, we only focus on words that have extreme concreteness scores (>=4, on a scale of 1 -5) and extreme abstractness scores (<=2). We focus on the extreme ends of the concreteness/abstractness spectrum to be consistent with prior literature, which suggests that extreme valence is highly correlated with emotion, memory and recognition of words (Ponari et al., 2018). More experimentation is needed to determine what effect our design choice of using the extreme values for concreteness has on the resulting topic model, such that, if we choose a different threshold of concreteness values, we might surface different patterns in the data. This would require manual inspection of the words contained in each topic and qualitative evaluation of the semantic content within and between topics.

Results
After constructing a topic model, the patterns noticed among the topics and among the words that were most common in each topic can be used to explain the construal levels of the users.It is important to note that some of the topics produced, specifically Topic 2, 7, and 10 contained foul language, reflecting the harsh and opinionated nature of the tweets made regarding this event. We summarize our two main findings in this paper, while more indepth analysis and contextualization within a larger research project is the main focus of an upcoming, larger publication.
Concreteness level differentiates between topics: Figure 2 shows the level of concreteness in each topic, arranged from Low to High Concreteness. For each individual post, a concreteness value above the mean was labelled as being "high concreteness", and below the mean was labelled as being "low concreteness". On a topic level, the concreteness value for each topic is determined internally by the STM library using prevalence, which based on the documentation 1 refers to how much of a document is associated with a topic taking into account the metadata provided. Figure 2 thus shows how the prevalence of topics differs across values of the categorical covariate which is the "concreteness" value. As discussed above, concrete terms refer to specific tangible objects, while abstract terms can be general ideas or emotions. Topics 3 and 9 stand out as the least and most concrete, resp. Other topics with high concreteness terms in the tweets are Topic 1, 6 and 10. Most topics are characterized by low concreteness values (Topics 3, 7, 8, 5 and 4). This makes sense due to the fact that most of our data relating to the event is collected before, in fact, months before the rally was scheduled to take place (our data collection starts in February while the main Charlottesville protests took place in August 2017). This means, on average, Topic 3 discusses the Charlottesville rally in more general ideas and terms, while Topic 9 discusses using specific people or more concrete objects. Terms that served as labels for topic 1 include "stand", "vote", and "quit", while topic labels for topic 3 include "outrage", "lead", and "nationalist". Frequent terms found in topic 1 are more easily visualized compared to terms in topic 3 that exhibit low concreteness and are considered more abstract. Terms in topics 6 and 10 include "america", "resisttrump", "assault", and "historic". These terms are imaginable and can present an image in the reader or tweeter's mind, showing the high concreteness of the tweets in the topics they belong to. Terms with low concreteness including "wrong", "praise", "civil", "approve", and "game" can be found in topics 4, 5, 7, and 8. These terms are (in contrast to those in topics 1, 6, and 10) less imaginable and do not clearly present a picture in the reader's mind, illustrating how the topics these terms belong to discuss more abstract ideas. Topic proportions over time reflect construals: The discussion of Topics 3 and 9 is important because they are so widely dissimilar. To investigate further, we plot the difference between the two topics over time in terms of expected topic proportion in Figure 3. This figure shows how tweets in Topic 9 began to steadily increase immediately after the Charlottesville protests began in August, and peaked during the period after the events, while Topic 3 (characterized by low concreteness language, with terms such as "outrage", "attention", "nationalist", and "return") declined during the month of the protests and was less popular during the peak of Topic 9. At the time of the protests (August 11-12), Topic 9 had begun to increase while topic 3 had been declining and reached its lowest point yet. Topic 9 also contains terms that may be related to the aftermath of the protests because they illustrate the reaction This suggests that topics associated with more concrete terms regarding the Charlottesville event, specifically Topic 9, were more prevalent after the event. Put differently, individuals were more likely to talk about the protests in concrete terms after the main protest event had passed (Aug 10-11). While the expected topic proportion of Topic 3 dips after the August time window, it does not dramatically differ from the previous expected topic proportion. This suggests that the abstract construals are likely to appear both before and after the event but not during. This finding is consistent with prior research applying Construal Level Theory in lab settings.

Conclusion
The protests that took place in Charlottesville in August of 2017 caused an outsize reaction on social media. We investigate how individuals perceive an event during its occurrence and after it ends, through the lens of Construal Level Theory. Our main finding is that adding concreteness values as covariates during topic modeling can help distinguish which topics were prevalent before, during and after the event. We find that during the ongoing discussion surrounding the protests (time period of Feb through Oct 2017 in our corpus), it was more likely that abstract terms that refer to ideas and emotions were used.
Notably, we found that language using more concrete terms was used to describe the events after they occurred. This finding is not surprising -it is easier to discuss an event in concrete terms after it occurs, because individuals will have specific objects (like car and torch) to refer to, in addition to proper nouns like specific names or places. However, a significant dip in the expected topic proportion after the event (c.f. Figure 3 Topic 9 trajectory) suggests that the this effect is attenuated over time. Our research can be used to gain insight into how to measure construals of events over time, and can be used to show what elements of an event people focus on as they react to it. Thus, our methodology showcases the use of quantitative methods which could be used to study how Construal Level Theory is reflected during crisis events. For future work, we also aim to study how our approach could be applied towards different crisis events.
Limitations: We acknowledge several limitations of our work: • Single Event: Our analysis is focused on a single event: the Charlottesville protest rally. As such, we cannot yet claim generalizability of our findings. We offer our research as a first foray into a series of analyses focusing on construals across varying events and contexts. For example, one direction for future work is suggested in analysis of construals about the COVID-19 pandemic at different stages of an ongoing, global event.
• Deeper Analysis of Concrete Terms: In this work, we do not present an in-depth study for the concrete vs. abstract words associated with each topic. Certainly, interesting questions to ask would be whether the frex terms (highest ranking frequent and exclusive words) or the highest probability words in each topic are correlated in any way with the concreteness values. We address this limitation as part of our future work. • Language Limitations: Our study is focused on an event that occurred in the United States.
As such, all of our data are in English. As part of addressing the question of generalizability of findings, we further aim to replicate our findings in multiple languages given appropriate data. Concreteness lexicons now exist in multiple languages, including Dutch (Brysbaert et al., 2014b) and French (Bonin et al., 2020), which makes this future analysis a viable option.