What Sounds “Right” to Me? Experiential Factors in the Perception of Political Ideology

In this paper, we challenge the assumption that political ideology is inherently built into text by presenting an investigation into the impact of experiential factors on annotator perceptions of political ideology. We construct an annotated corpus of U.S. political discussion, where in addition to ideology labels for texts, annotators provide information about their political affiliation, exposure to political news, and familiarity with the source domain of discussion, Reddit. We investigate the variability in ideology judgments across annotators, finding evidence that these experiential factors may influence the consistency of how political ideologies are perceived. Finally, we present evidence that understanding how humans perceive and interpret ideology from texts remains a challenging task for state-of-the-art language models, pointing towards potential issues when modeling user experiences that may require more contextual knowledge.


Introduction
Social media companies, like Twitter, Facebook, and Reddit, play an important role in political discourse by providing a space for users to interact with different viewpoints. Understanding political discussion on these platforms often requires one to identify the ideologies behind texts, as understanding the viewpoints reflected in a text can provide insight into the partisanship of beliefs (Monroe et al., 2008) or the persuasive strategies used by different ideological groups (Tsur et al., 2015).
Prior research on political discussion often relies on a "ground-truth" to aid in obtaining ideology labels for social media data. For example, due to the scale of political content on social media, a common paradigm is to obtain some ground-truth labels that are propagated to a larger set of texts using semi-supervised learning (Lin and Cohen, 2010;Zhou et al., 2011). The relationship between a social media artifact and various forms of established political knowledge can also be used to ground or validate ideology labels. Some examples of this include using author interactions with politicians with known party affiliations (Djemili et al., 2014;Barberá, 2015), ideological communities (Chandrasekharan et al., 2017;Shen and Rosé, 2019), and central users (Pennacchiotti and Popescu, 2011; as a starting heuristic, or evaluating a labeling approach by comparing geolocation tags attached to posts with historical voting patterns (Demszky et al., 2019).
A limitation of these approaches, however, is that behavior on social media does not evenly or uniformly reflect the held political beliefs of participants. While there is evidence that people tend to engage with others who share similar beliefs (Halberstam and Knight, 2016), people also commonly interact with or even seek out communities and users they do not agree with (Kelly et al., 2005;Tan et al., 2016). Additionally, the practice of displaying one's political beliefs, which many grounding techniques rely on, varies in prevalence across online communities (Lampe et al., 2007;Zhong et al., 2017;Pathak, 2020). The concept of linguistic agency (Goffman et al., 1978) also challenges the idea that individual factors, such as ideology, are predictably presented in text. Based on an author's social goals for participating in political discussion, it may not be contextually relevant to project a strong impression of their political ideology. People engaged in interactive political discussion, however, still form perceptions about the alignments of others based on how they sound, often relying on their own conceptions of ideology in the process.
The issue of perceiving ideology also plays a role when ideology labels are obtained using crowdsourced annotators. While making judgments, the annotator plays a similar role to a user participating in the discussion when perceiving the ideology of the speaker behind a text. However, annotators are expected to assign an explicit ideology label to a text with less contextual knowledge about how the text was produced. Thus, annotators may rely heavily on their own experiential factors, such as one's own beliefs or level of political engagement, when considering ideology. As a result, this process may introduce inconsistencies and biases in ideological labels used for political analysis.
In this paper, we present an exploration of how experiential factors play a role in how annotators perceive ideology in text. Building upon prior work investigating annotation bias (Zaidan and Callison-Burch, 2011;Waseem, 2016;Joseph et al., 2017;Ross et al., 2017;Schwartz et al., 2017;Geva et al., 2019), we construct an annotated corpus of posts from political subcommunities on Reddit but incorporate additional contextual information about the annotators making ideology judgments. 1 While previous work (Joseph et al., 2017) has shown that source-side contextual features, such as user profiles and previous tweets, can influence label quality in stance annotation, we focus our analyses on contextual factors on the side of annotators. Most similar to our work, Carpenter et al. (2017) and Carpenter et al. (2018) examine the impact of an annotator's identity and openness on their ability to accurately assess author attributes, including political orientation. In our work, however, we examine the impact of an annotator's political beliefs, knowledge, and Reddit familiarity, on their judgments, using factors more specific to political participation on Reddit. We additionally consider the issue of annotator bias in ideology labeling not as an issue of accuracy but rather an issue of social variability. Under this view, we evaluate the performance of a state-of-the-art language model on its capacity to mirror different human perceptions of ideology to examine whether extralinguistic factors introduced through annotation may degrade model performance compared to other labels.

Dataset Construction
Our dataset is drawn from the popular content aggregation and discussion platform Reddit. Political discussion on Reddit is centered on subreddits, subcommunities centered on support for specific political candidates, organizations, and issues. For our analyses, we aim to label political distinctions on Reddit along the left-right political spectrum in U.S. politics. Using the monthly dumps from May to September 2019 from the Reddit Pushshift API (Baumgartner et al., 2020), we collect all submissions and comments from the top political subreddits 2 by subscriber count. The collected subreddits were manually labeled as left or right, based on the subreddit description and top posts. We then select the top 12 left and top 12 right subreddits from the monthly dumps where discussion is primarily focused on U.S. politics. 3 The selected subreddits are shown in Table 3 (Supplementary Material).

Paired Ideology Ranking Task
Prior work on annotating viewpoints (Iyyer et al., 2014;Bamman and Smith, 2015) generally presents annotators with texts in isolation to label with an ideology of interest. One drawback of this approach is the high degree of political expertise annotators are required to have to recognize that a text matches an ideology. To reduce the amount of overhead in recruiting and training political annotators, we present annotators instead with a paired ideology ranking task. Rather than examining texts in isolation, annotators are shown two texts and asked to select the text that is more likely to be authored by someone with the ideology of interest.
For our setup, our goal is to pair a text authored by a left-leaning user with one by a right-leaning user. We use a heuristic-based semi-supervised approach to label texts based on the subreddit participation patterns of their authors. To expand the set of subreddits with ideological labels, we label all subreddits in the monthly dump data as left, neutral, or right based on user overlap with the 24 political subreddits with a known ideological slant (Section 2). For each subreddit, we calculate the z-score of the log odds ratio of a user participating in that subreddit and a known left-leaning subreddit vs. a right-leaning subreddit. A subreddit is labeled as either "left" or "right" if the calculated z-score satisfies a one-tailed Z test at p = 0.05 in the corresponding direction or "neutral" otherwise. Authors are then labeled based on their distribution of participation on the left vs. right subreddits. While users on Reddit have been shown to primarily engage with pro-social home communities (Datta and Adar, 2019) and similar heuristics have been used in prior work as an indicator of user interests and/or ideology (Olson and Neal, 2015;Chandrasekharan et al., 2017;Shen and Rosé, 2019), we emphasize that we use this heuristic to create a basis of comparison, rather than assuming that it provides "correct" ideology labels.
In order to ensure that the text comparison helps annotators to perceive ideological differences, rather than presenting two unrelated texts that are essentially considered in isolation, we want to present paired texts that are similar in content. As a first step for generating comparisons with similar content, we require paired texts to discuss the same entity, since political discussions are primarily centered on the politicians, organizations, and geopolitical entities influencing policy decisions. To identify entities of interest, we use Stanford Core NLP (Manning et al., 2014) to extract occurrences of people, locations, organizations, and ideologies over our corpus of 24 subreddits. We limit entities under consideration to those that have occurred at least 300 times in our corpus and are easy to disambiguate. The considered entities are shown in Table 4 (Supplementary Material).
To limit the impact of confounds, such as topic or entity salience, when comparing texts with the same entity, we use propensity score matching (Rosenbaum and Rubin, 1983) to match each leftaligned text with a right-aligned text that discusses the same entity in a similar context. A subset of 65 pairs was manually curated to use as screening questions to ensure that workers had a baseline knowledge of U.S. politics. These screening pairs were selected to be easier than the main task pairsthey are more limited in which entities discussed and express more explicit and/or extreme attitudes.

Annotation Task Details
We recruit workers on Amazon Mechanical Turk to complete our paired ideological ranking task. Given a pair of texts discussing the same highlighted political entity, we ask annotators to determine which of the two posts is more likely to have been written by someone who is either left-leaning or right-leaning. Annotators were instructed to use as many contextual cues as possible to form an impression of the political views held by the authors of the texts. To provide some guidance to annota-tors for what cues to consider, we train workers to consider the following features in the instructions: • Attitude: evaluation in favor of or against an entity. Ex: I trust Bernie from someone who favors Bernie Sanders (left).
• Positioning: situating one's viewpoint with respect to the entity's. Ex: Listen to the Dems refers to Democrats as an out-group (right).
The annotation task is shown in Figure 1 (Supplementary Material). Each worker was asked to annotate 18 pairs from our main task set and 8 screening questions, which were scattered throughout the assignment as an attention check. For each main task pair, we assign up to 5 workers for annotation. We restrict the worker pool to the U.S. and filter out workers who scored less than a 75% on the screening questions. Overall, we collect annotations for 630 non-screening pairs.

Annotator Background Post-Survey
After the annotation task, workers were asked to complete a survey (questions listed in Supplementary Material A) to assess their political affiliation, exposure to U.S. political news, and familiarity with political discussion on Reddit. Answers to the survey were inspected manually to assign annotators labels along three identifier categories: • Political ideology: This category indicates the annotator's political ideology. Annotators are labeled as left, center, or right based on their self-identified ideology and affiliation with U.S. political parties.
• News access: This category indicates the annotator's exposure to political news. Annotators are labeled as news or non-news based on how frequently they access news on the 2020 U.S. presidential election.

Annotator Demographics
Of the 180 recruited workers initially recruited for the task, 22 were discarded for answering fewer than 75% of the screening questions correctly, giving us a final pool of 158 annotators. Table 1 illustrates the distribution of the remaining workers across labels within the three categories. Labels across categories do not appear to be correlated (mean variance inflation factor = 1.043).

Agreement/Difference Results
We use Krippendorf's α (Krippendorff, 2004) to evaluate annotator agreement on our task to account for different user pools for each question. Despite a high degree of agreement across the pool of screening questions (α = 0.7311), the overall agreement across annotators in our general, non-screening set is relatively low (α = 0.3878), suggesting that the task of predicting the ideology of a text is nuanced and open to interpretation. We also calculate agreement for workers within each of our annotator groups (Table 1) in order to examine whether annotators with similar backgrounds are more likely to perceive ideology similarly. Overall, in-group agreement remains around the same level as the general task. However, an interesting pattern across annotator labels is that workers who are less likely to be familiar with the expression of political ideology on Redditnon-redditors (α = 0.359), people who do not frequently read political news (α = 0.336), and people who do not identify with the left or right (α = 0.325) -have lower agreement. This suggests that familiarity with the norms of political discussion on Reddit may contribute to a more consistent perception of ideology for Reddit texts.
We additionally use McNemar's chi-squared test over pairwise comparisons of annotator groups under the same category to examine whether annotators with different backgrounds differ in their judgments. To ground the comparison, we evaluate annotator groups based on whether the majority of workers in the group gave the same answer as our semi-supervised labels (Section 2.1). Because these semi-supervised labels only provide a noisy estimate of ideology, we use these labels to create a basis of comparison. Rather than to check how "accurately" each group estimates ideology, this heuristic allows us to specifically quantify differences in judgments between groups. We find that for all comparison pairs, groups differ significantly in their answers over the same questions. In our pairwise comparisons, we also saw that the ideology of the annotator contributes heavily to variability in annotator judgments. The two groups with the highest percentage of questions with mismatched answers are left-leaning and right-leaning annotators, and 3 of the top 4 comparison pairs with the most mismatched answers are between ideology groups (Supplementary Material Table 6).

Sources of Variability
To examine possible explanations for the variability in annotator judgments across groups, we focus primarily on differences in judgments between left-leaning and right-leaning annotators. When examining differences at the entity-level, we find that the entities with the most mismatches tended to be highly visible entities that had a strong connection to a particular party during the 2020 election, such as highly visible political figures (e.g. Joe Biden, Nancy Pelosi) or the most common ideologies associated with each side (e.g. Republican Party, conservatism, liberalism), compared to less salient entities. This is unsurprising, as we expect people to develop different conceptions of salient entities building up to major events like elections, even with relatively limited media exposure.
Finally, to investigate what aspects of the posts themselves contributed to variations in judgments between left-leaning and right-leaning workers, we ran a salience (Monroe et al., 2008) analysis for mismatched question pairs with highly visible entities. We found that annotators were less likely to select a post that expresses explicit abuse towards an opposing entity as being authored by someone with the same political views as themselves. For example, a right-leaning annotator was less likely to consider a post calling Biden a "pedophile" as right-leaning compared to liberal annotator. This may suggest that social desirability bias (Krumpal, 2013), may have an impact on decision-making, even when the task is not directly related to collecting data about the annotator themselves.

Perceptions vs. Heuristic Labels
Prior work (Castelle, 2018) suggests that deep text classification models perform poorly when labels are influenced by extralinguistic contextual factors. While the semi-supervised labels that we generated are based on a behavioral heuristic outside of the text, our analyses of human judgments suggest that the annotation process introduced additional interactional factors into ideological labeling. We investigate whether these factors influence model performance by evaluating a BERT-based (Devlin et al., 2019) model on its ability to match human judgments on the paired ideology ranking task. For our evaluation model, we finetune BERTmask on the 24 subreddit corpus. Next, for each text, we average its contextual embeddings in two ways: over (a) all tokens in the text and (b) all entity-related tokens in the text. We then concatenate the averaged embeddings, then use the resulting vector as input to a pairwise logistic regression model. For each annotator group, we use the majority answer for each question as the group label. Table 2 shows the performance of the model on the full 630 pair non-screening set. For all annotator groups, we found that the model has a significant drop in performance when asked to match human judgments vs. labels generated through our semi-supervised heuristic on the same dataset. To examine whether this drop in performance was due to inconsistencies in human judgments on particularly difficult or contentious distinctions, we additionally present results on a higher consensus subset (α = 0.6216) of 459 text pairs, where at least 75% of workers select the same answer. We found that while there was a small increase in performance on matching human judgments on the high consensus subset for all groups, performance still dropped compared to the semi-supervised la-

SS (F) H (F) SS (C) H (C)
Overall  Table 2: F1 scores for a BERT-based ranking model on semi-supervised (SS) and human annotator (H) labels for the full non-screening set (F) and a high-consensus subset (C). *p < 0.05 difference in performance between the semi-supervised and human annotator labels bels, suggesting that matching human understanding of ideology is challenging for these models.

Conclusion and Future Work
In this paper, we reconsider the idea of groundtruth labels of political ideology and investigate the impact of experiential factors on human perception of ideology in text. We construct and analyze an annotated corpus that incorporates experiential information about annotators, finding evidence that annotator backgrounds influence the consistency of political ideology judgments and that current classification models struggle to match human perceptions of ideology across different groups. From our analyses on factors contributing to variations in judgments, there is a greater need for targeted recruiting of annotators that are familiar with and contextualized to the domain being annotated. In future work, we aim to extend our investigation to examine how stylistic elements of text contribute to people's perception of political ideologies in interaction. These analyses may provide further insight into the effectiveness of political communication strategies or the differences in how political groups interact with in-group and out-group members.

Selected subreddits
Left r/LateStageCapitalism, r/SandersForPresident, r/democrats, r/socialism, r/Liberal, r/VoteBlue, r/progressive, r/ChapoTrapHouse, r/neoliberal, r/esist, r/The Mueller, r/The Mueller Right r/The Donald, r/Libertarian, r/Republican, r/Conservative, r/JordanPeterson, r/TheNewRight, r/Anarcho Capitalism, r/conservatives, r/ShitPoliticsSays, r/POLITIC, r/AskTrumpSupporters, r/AskThe Donald Locations Russia Table 4: Selected entities included in the construction of the dataset. Italicized entities are also included in the screening set. Figure 1: Screenshot of a question in the paired ideological annotation task. Annotators are presented with two texts discussing the same highlighted entity in a similar context, one from a left-leaning user and another from a right-leaning user based on a semi-supervised labeling heuristic. Annotators are asked to select which of the two texts is more likely to be authored by someone with the highlighted ideology.