PAR: Political Actor Representation Learning with Social Context and Expert Knowledge

Modeling the ideological perspectives of political actors is an essential task in computational political science with applications in many downstream tasks. Existing approaches are generally limited to textual data and voting records, while they neglect the rich social context and valuable expert knowledge for holistic ideological analysis. In this paper, we propose PAR, a Political Actor Representation learning framework that jointly leverages social context and expert knowledge. Specifically, we retrieve and extract factual statements about legislators to leverage social context information. We then construct a heterogeneous information network to incorporate social context and use relational graph neural networks to learn legislator representations. Finally, we train PAR with three objectives to align representation learning with expert knowledge, model ideological stance consistency, and simulate the echo chamber phenomenon. Extensive experiments demonstrate that PAR is better at augmenting political text understanding and successfully advances the state-of-the-art in political perspective detection and roll call vote prediction. Further analysis proves that PAR learns representations that reflect the political reality and provide new insights into political behavior.


Introduction
Modeling the perspectives of political actors has applications in various downstream tasks such as roll call vote prediction (Mou et al., 2021) and political perspective detection (Feng et al., 2021).Existing approaches generally focus on voting records or textual information of political actors to induce their stances.Ideal point model (Clinton et al., 2004) is one of the most widely used approach for vote-based analysis, while later works enhance the ideal point model (Kraft et al., 2016;Gu et al., 2014;Gerrish and Blei, 2011) and yield promising results on the task of roll call vote prediction.For text-based methods, text analysis techniques are combined with textual information in social media posts (Li and Goldwasser, 2019), Wikipedia pages (Feng et al., 2021), legislative text (Mou et al., 2021) and news articles (Li and Goldwasser, 2021) to enrich the perspective analysis process.
However, existing methods fail to incorporate the rich social context and valuable expert knowledge of political actors.As illustrated in Figure 1, social context information such as home state and party affiliation serves as background knowledge and helps connect different political actors (Yang et al., 2021).These social context facts about political actors also differentiate them and indicate their ideological stances.In addition, expert knowledge from political think tanks provides valuable insights and helps to anchor the perspective analysis process.As a result, political actor representation learning should be guided by domain expertise to facilitate downstream tasks in computational political science.That being said, social context and expert knowledge should be incorporated in modeling legislators to ensure a holistic evaluation.
In light of these challenges, we propose PAR, a legislator representation learning framework that jointly leverages social context and expert knowledge.We firstly collect a dataset of political actors by retrieving and extracting social context information from their Wikipedia homepages and adapting expert knowledge from two political think tanks AFL-CIO1 and Heritage Action2 .After that, we construct a heterogeneous information network to model social context information and adopt relational graph neural networks for representation learning.Finally, we train the framework with three training objectives to leverage expert knowledge, model social and political phenomena, and learn representations of political actors in the process.We evaluate PAR on two computational political science tasks and examine the political behaviour of learned representations.Our main contributions are summarized as follows: • We propose PAR, a graph-based approach to learn legislator representations with three training objectives, which aligns representation learning with expert knowledge, ensures stance consistency, and models the echo chamber phenomenon in socio-economic systems.
• Extensive experiments demonstrate that PAR advances the state-of-the-art on two computational political science tasks, namely political perspective detection and roll call vote prediction.
• Further analysis shows that PAR learns representations that reflect the ideological preferences and political behavior of political actors such as legislators, governors, and states.In addition, PAR provides interesting insights into the contemporary political reality.

Related Work
The ideological perspectives of political actors play an essential role in their individual behavior and adds up to influence the overall legislative process (Freeden, 2006;Bamman et al., 2012;Wilkerson and Casas, 2017).Political scientists first explored to quantitatively model political actors based on their voting behaviour.Ideal point model (Clinton et al., 2004)  In addition to voting, textual data such as speeches and public statements are also leveraged to model political perspectives (Evans et al., 2007;Thomas et al., 2006;Hasan and Ng, 2013;Zhang et al., 2022;Sinno et al., 2022;Liu et al., 2022;Davoodi et al., 2022;Alkiek et al., 2022;Dayanik et al., 2022;Pujari and Goldwasser, 2021;Villegas et al., 2021;Li et al., 2021;Sen et al., 2020;Baly et al., 2020).Volkova et al. (2014) leverage message streams to inference users political preference.Johnson and Goldwasser (2016) propose to better understand political stances by analyzing politicians' tweet and their temporal activities.Li and Goldwasser (2019) propose to analyze Twitter posts to better understand news stances.Prakash and Tayyar Madabushi (2020) and Kawintiranon and Singh (2021) leverage pre-trained language model for stance prediction.Augenstein et al. (2016) uses conditional LSTM to encode tweets for stance detection.Furthermore, CNN (Wei et al., 2016) and hierarchical attention networks (Sun et al., 2018a) are adopted for stance detection.Darwish et al. (2020) proposes an unsupervised framework for user representation learning and stance detection.Yang et al. (2021) proposes to jointly model legislators and legislations for vote prediction.Feng et al. (2021) introduces Wikipedia corpus and constructs knowledge graphs to facilitate perspective detection.Mou et al. (2021) proposes to leverage tweets, hashtags and legislative text to grasp the full picture of the political discourse.Li and Goldwasser (2021) designs pretraining tasks with social and linguistic information to augment political analysis.However, these vote and text-based methods fail to leverage the rich social context of political actors and the valuable expert knowledge of political think tanks.In this paper, we aim to incorporate social context and expert knowledge while focusing on learning representations of political actors to model their perspectives and facilitate downstream tasks.

Methodology
Figure 2 presents an overview of PAR (Political Actor Representation learning).We firstly collect a dataset of political actors from Wikipedia and political think tanks.We then construct a heterogeneous information network to jointly model political actors and social context information.Finally, we learn graph representations with gated relational graph convolutional networks (gated R-GCN) and train the framework with three different objectives to leverage expert knowledge and model various socio-political phenomena.

Data Collection
We firstly collect a dataset about political actors in the United States that were active in the past decade.For social context information, we retrieve the list of senators and congresspersons from the 114th congress to the 117th congress3 .We then retrieve their Wikipedia pages4 and extract these named entities: presidents, senators, congresspersons, governors, states, political parties, supreme court justices, government institutions, and office terms (117th congress etc.).In this way, we obtain 1,069 social and political entities.Based on these entities, we identify five types of relations: party affiliation, home state, political office, term in office, and appoint relationships.In this way, we obtain 9,248 heterogeneous edges.For expert knowledge, we use legislator scoreboards at AFL-CIO and Heritage Action, two political think tanks that lie in the opposite ends of the ideological spectrum.Specifically, we retrieve and extract each legislator's score in each office term.In this way, we obtain 777 scores from AFL-CIO and 679 scores from Heritage Action.We consolidate the collected social context and expert knowledge to serve as the dataset in our experiments.Although we focus on political actors in the United States in this paper, our data collection process is also applicable for other countries and time ranges.

Graph Construction
To better model the interactions between political entities and their shared social context, we propose to construct a heterogeneous information network (HIN) from the dataset.

Heterogeneous Nodes
Based on the collected dataset, we select diversified entities that are essential factors in modeling the political process.Specifically, we use eight types of nodes to represent political actors and diversified social context entities.N 1: Office Terms.We use four nodes to represent the 114th, 115th, 116th, 117th congress spanning from 2015 to 2022.We use these nodes to model different political scenarios and could be similarly extended to other time periods.N 2: Legislators.We retrieve senators and congresspersons from the 114th to 117th congress and use one node to represent each distinct legislator.N 3: Presidents.President is the highest elected office in the United States.We use three nodes to represent President Biden, Trump and Obama to match with the time range in N 1.N 4: Governors.State and local politics are also essential in analyzing the political process.We use one node to represent each distinct governor of 50 states within the time range of N 1.N 5: States.The home state of political actors is often an important indicator and helps connect different individuals.We use one node to represent each state in the United States.N 6: Government Institutions.We use five nodes to represent the white house, senate, house of representatives, supreme court and governorship.These nodes enable our HIN to separate different political actors based on the office they hold.N 7: Supreme Court Justices.Supreme court justices are nominated by presidents and approved by senators, which helps connect different types of political actors.We use one node to represent each supreme court justice within the time range of N 1.N 8: Political Parties.We use two nodes to represent the Republican Party and the Democratic Party in the United States.
For node features, we use pre-trained RoBERTa (Liu et al., 2019) to encode the first paragraph of Wikipedia pages and average all tokens.

Heterogeneous Relations
Based on N 1 to N 8, we extract five types of informative interactions between entities to complete the HIN structure.Specifically, we use five types of heterogeneous relations to connect different nodes and construct our political actor HIN.R1: Party Affiliation.We connect political actors and their affiliated political party with R1: R2: Home State.We connect political actors with their home states with R2: R3: Hold Office.We connect political actors with the political office they hold with R3: R4: Time in Office.If a political actor holds office during one of the time stamps in N 1, we connect them with R4: R5: Appoint.Besides from being elected, certain political actors are appointed by others.We denote this relation with R5:

Representation Learning
Since nodes represent political actors, we learn node-level representations with gated R-GCN to jointly leverage social context and expert knowledge.Let E = {e 1 , • • •, e n } be n entities in the HIN and v i be the initial features of entity e i .Let R be the heterogeneous relation set and N r (e i ) be entity e i 's neighborhood with regard to relation type r.We firstly transform v i to serve as the input of graph neural networks: where φ is leaky-relu, W I and b I are learnable parameters.We then propagate entity messages and aggregate them with gated R-GCN.For the l-th layer of gated R-GCN: where f s and f r are parameterized linear layers for self loops and edges of relation r, u (l) i is the hidden representation for entity e i at layer l.We then calculate gate levels: where W G and b G are learnable parameters, σ(•) denotes the sigmoid function and [•, •] denotes the concatenation operation.We then apply the gate mechanism to u (l) i and x i ) (9) where ⊙ is the Hadamard product.After L layer(s) of gated R-GCN, we obtain node features n } and the nodes representing political actors are extracted as learned representations.

Model Training
We propose to train PAR with a combination of three training objectives, which aligns representation learning with expert knowledge, ensures stance consistency and simulates the echo chamber phenomenon.The overall loss function of PAR is as follows: where λ i is the weight of loss L i and θ are learnable parameters in PAR.We then present the motivation and detail of each loss function L 1 , L 2 and L 3 .

Objective 1: Expert Knowledge
The expert knowledge objective aims to align the learned representations with expert knowledge from political think tanks.We use the learned representations of political actors to predict their liberal and conservative stances, which are adapted from AFL-CIO and Heritage Action.Specifically: where l i and c i are predicted stances towards liberal and conservative values, W L , b L , W C and b C are learnable parameters.Let E L and E C be the training set of AFL-CIO and Heritage Action scores, li and ĉi denote the ground-truth among D possible labels.We calculate the expert knowledge loss: ĉid log(c id ) (12) L1 enables PAR to align learned representations with expert knowledge from political think tanks.

Objective 2: Stance Consistency
The stance consistency objective is motivated by the fact that liberalism and conservatism are opposite ideologies, thus individuals often take inversely correlated stances towards them.We firstly speculate entities' stance towards the opposite ideology by taking the opposite of the predicted stance: (13) where ψ is the one-hot encoder, argmax(•) calculates the vector index with the largest value, D is the number of stance labels, li and ci are labels derived with stance consistency.We calculate the loss function L 2 measuring stance consistency: As a result, the loss function L 2 enables PAR to ensure ideological stance consistency among political actors.

Objective 3: Echo Chamber
The echo chamber objective is motivated by the echo chamber phenomenon (Jamieson and Cappella, 2008;Barberá et al., 2015), where social entities tend to reinforce their narratives by forming small and closely connected interaction circles.We simulate echo chambers by assuming that neighboring nodes in the HIN have similar representations while non-neighboring nodes have different representations.We firstly define the positive and negative neighborhood of entity e i : We then calculate the echo chamber loss: where Q denotes the weight for negative samples.L 3 enables PAR to model the echo chamber phenomenon in real-world socio-economic systems.

Experiment
After learning political actor representations with PAR, we study whether they are effective in computational political science and reveal real-world political behavior.Specifically, we test out PAR on political perspective detection and roll call vote prediction, two political text understanding tasks that emphasizes political actor modeling.We leverage PAR in these tasks to examine whether it could contribute to the better understanding of political text.We then study the learned representations of PAR and whether they provide new insights into real-world politics.

Political Perspective Detection
Political perspective detection aims to detect stances in text such as public statements and news articles, which generally mention many political actors to provide context and present arguments.Existing methods model political actors with masked entity models (Li and Goldwasser, 2021) and knowledge graph embedding techniques (Feng et al., 2021).We examine whether PAR is more effective than these models and consequently improve political perspective detection.

Datasets
We follow previous works (Li and Goldwasser, 2021;Feng et al., 2021) and adopt two political perspective detection benchmarks: SemEval and Allsides.SemEval (Kiesel et al., 2019)  identify whether a news article follows hyperpartisan argumentation.We follow the 10-fold cross validation setting and the exact same folds established in Li and Goldwasser (2021) so that the results are directly comparable.Allsides (Li and Goldwasser, 2019) provides left, center, or right labels for three-way classification.We follow the 3-fold cross validation setting and the exact same folds established in Li and Goldwasser (2019) so that the results are directly comparable.

Baselines
We compare PAR with competitive baselines: • BERT (Devlin et al., 2019) is fine-tuned on the task of political perspective detection.
• MAN (Li and Goldwasser, 2021) leverages social and linguistic information for pre-training a BERT-based model and fine-tune on the task of political perspective detection.
• KGAP (Feng et al., 2021) et al., 2018b).It then constructs document graphs with text and legislators as nodes and uses GNNs for stance detection.
• KGAP_PAR: To examine whether PAR is effective in political perspective detection, we replace the knowledge graph embeddings in KGAP with political actor representations learned with PAR.

Results
We evaluate PAR and baselines and present model performance in Table 1, which demonstrates that PAR achieves state-of-the-art performance on both datasets.The fact that PAR outperforms MAN and KCD, two methods that also take political actors into account, shows that PAR learns high-quality representations that results in performance gains.

Roll Call Vote Prediction
Roll call vote prediction aims to predict whether a legislator will vote in favor or against a specific legislation.Ideal points (Gerrish and Blei, 2011), ideal vectors (Kraft et al., 2016) have been proposed to model ideological preferences.We examine whether PAR is more effective than these models in roll call vote prediction.

Datasets
We adopt the datasets and settings proposed in Mou et al. (2021) to evaluate PAR and competitive baselines.Specifically, for the random setting, voting records in the 114th and 115th congresses are randomly split into 6:2:2 for training, validation, and testing.For the time-based setting, the voting records in the 114th congress is split into 8:2 for training and validation, while the voting records in the 115th congress is used for testing.

Baselines
We compare PAR with competitive baselines: • majority predicts all votes as yea.
• ideal-point-wf and ideal-point-tfidf (Gerrish and Blei, 2011) adopt word frequency and TFIDF of legislation text as features and leverage the ideal point model for vote prediction.
• ideal-vector (Kraft et al., 2016) learns distributed representations with legislator and bill text.
• CNN (Kornilova et al., 2018) encodes bill text with CNNs.On top of that, CNN+meta introduces metadata of bill sponsorship to the process.
• Vote (Mou et al., 2021) proposes to align statements on social networks with voting records.
• RoBERTa and TransE use RoBERTa (Liu et al., 2019) encoding of legislator description on Wikipedia or TransE (Bordes et al., 2013) embeddings for legislator representation learning.They are then concatenated with RoBERTa encoded bill text for vote prediction.
• PAR concatenates political actor representations learned with PAR and RoBERTa encoded legislator text for roll call vote prediction.

Results
We evaluate PAR and competitive baselines on roll call vote prediction and present model performance in Table 2. PAR achieves state-of-the-art performance, outperforming existing baselines that model political actors in different ways.As a result, PAR learns high-quality representations of political actors that provide political knowledge and augment the vote prediction process.

Political Findings of PAR
PAR learns political actor representations with social context and expert knowledge, which has been proven effective in political perspective detection and roll call vote prediction.We further examine whether PAR provides new insights into the political behavior of legislators, states, and governors.

Legislators and PAR
To examine whether legislator representations learned with PAR align well with different social

States and PAR
Political commentary often uses "blue state", "red state", and "swing state" to describe the ideological preference of states in the U.S. We examine the ideological scores of states learned by PAR (l i and c i in Equation ( 11)) and compare them with results in the 2020 U.S. presidential election in Figure 4.It is illustrated that PAR stance predictions highly correlate with the 2020 presidential election results.In addition, PAR reveals that Pennsylvania and North Carolina, two traditional swing states, are actually more partisan than expected.PAR also suggests that Georgia is the most electorally competitive state in the United States and we will continue to monitor this conclusion in future elections.

Governors and PAR
Political experts typically study and evaluate the stances of presidents and federal legislators, while state-level officials such as governors are also essential in governance and policy making (Beyle, 1988).PAR complements the understanding of state-level politics by learning ideological scores for governors.Specifically, we use l i and c i in Equation ( 11) to evaluate the ideological position of governors and present PAR's predictions of governor stances in all 50 U.S. states in Figure 5 7 .It is no surprise that governors from partisan strongholds such as California and Utah hold firm stances.Conventional wisdom often assumes that in order to win "swing states" or electorally challenging races, one has to compromise their ideological stances and lean to the center.However, PAR reveals exceptions to this rule, such as Andy Beshear (D-KY) and Ron DeSantis (R-FL), which is also suggested in political commentary (Jr., 2020;Graham, 2022).

Conclusion
In this paper, we present PAR, a framework to learn representations of political actors with social context and expert knowledge.Specifically, we retrieve social context information from Wikipedia and expert knowledge from political think tanks, construct a HIN to model legislators and learn representations with gated R-GCNs and three training objectives.Extensive experiments demonstrate that PAR advances the state-of-the-art in political perspective detection and roll call vote prediction.
PAR further provides novel insights into political actors through its representation learning process.

Limitations
• PAR proposes to learn representations for political actors with the help of social context and expert knowledge.Though it achieved great performance on several tasks and provided interesting political insights, it might reinforce political stereotypes, such as people from "red states" are often assumed to be more conservative.We leave for future work how to mitigate the potential bias in political actor representations.
• In this paper, we focus on political actors in the United States instead of other countries.However, our proposed framework could be easily extended to other scenarios by leveraging the Wikipedia pages of political actors for social context and political think tanks in other countries for expert knowledge.

A Expert Knowledge Prediction
We propose to train PAR with three objectives, the first one L 1 being expert knowledge prediction, where the model learns to predict how liberal or conservative a given legislator is.To examine whether the PAR architecture successfully conducts expert knowledge prediction, we compare PAR with various text (Pedregosa et al., 2011;Horne et al., 2018;Pennington et al., 2014;Liu et al., 2019;Beltagy et al., 2020) and graph (Kipf and Welling, 2016;Veličković et al., 2018;Hamilton et al., 2017;Shi et al., 2020;Bresson and Laurent, 2017) analysis baselines.

B Ablation Study
PAR aims to learn representations of political actors with social context and expert knowledge as well as three training objectives.We conduct ablation study to examine their effect in the representation learning process.We report model performance on the expert knowledge prediction task and follow the same dataset splits as in Table 3. R1 to R5 follows that in Section 3.2.2.

B.1 Social Context
We use five types of heterogeneous relations R1 to R5 to connect different entities based on their social context.We randomly and gradually remove five types of social context edges in the constructed HIN and report model performance in Figure 6.
It is illustrated that all relations but R4 (Time in Office) significantly contributes to the overall performance.Besides, Figure 6 illustrates a great gap between 90% and 100% edges, suggesting the importance of a complete HIN structure.

B.2 Expert Knowledge
We learn legislator representations with the help of two political think tanks: AFL-CIO and Heritage Action.We retrieve their evaluation of political actors amd construct the expert knowledge objective for training.To examine the effect of expert knowledge in our proposed approach, we gradually remove expert knowledge labels in L 1 and report model performance in Figure 7.It is illustrated that our performance drops with partial expert knowledge from either AFL-CIO or Heritage Action, which indicates that expert knowledge is essential in the representation learning process.

B.4 Graph Learning
We adopted five relations R1 to R5 to connect eight types of entities N 1 to N 8, so that the graph is heterogeneous.To examine whether the graph heterogeneity contributes to model performance, we substitute gated R-GCNs with homogeneous GNNs such as GCN, GAT, and GraphSAGE.Table 5 shows mixed results, where PAR performs best with gated R-GCNs while R-GCN does not outperform GraphSAGE.

C Error Analysis
We manually examined the part of the results in the roll call vote prediction task.Among the (legislator, bill) pairs where PAR made a wrong prediction, it is often the case that the legislator has voted across party lines.This suggests that more information about these legislators are required to achieve more fine-grained analysis on these borderline cases.

D Reproducibility Details
In this section, we provide additional details to facilitate reproducing our results and findings.We submit data and code as supplementary material  and commit to make them publicly available upon acceptance to facilitate reproduction.

D.1 Hyperparameters
We present the hyperparameter settings of our proposed approach in Table 6.These hyperparameters are manually tuned.We follow these settings throughout the paper unless stated otherwise.

D.3 Computation
Our proposed approach has a total of 2.0M learnable parameters with hyperparameters in Table 6.We train it on a Titan X GPU with 12GB memory.
It takes approximately 0.6 GPU hours for training with hyperparameters in Table 6.

Figure 1 :
Figure 1: Social context information and political expert knowledge that helps to model political actors.

Figure 2 :
Figure 2: Overview of PAR, political actor representation learning with social context and expert knowledge.

Figure 3 :
Figure 3: Using t-sne to visualize learned representations of political actors.DBI denotes the Davies-Bouldin Index.

Figure 4 :
Figure 4: State stances predicted by PAR compared to vote shares in the 2020 U.S. presidential election.

Figure 6 :
Figure 6: Ablation study of social context information.R1 to R5 follows that in Section 3.2.2.

Figure 7 :
Figure 7: Ablation study of expert knowledge.We report accuracy on the expert knowledge prediction task.

0
.01 ≤ λ 3 ≤ 0.1 would generally lead to an effective balance of three different training objectives.

Figure 8 :
Figure 8: Model accuracy with different loss weights.

Table 1 :
aims to Political perspective detection performance on two benchmark datasets.Acc and MaF denote accuracy and macro-averaged F1-score.N/A indicates that the result is not reported in previous works.

Table 3 :
Our model's performance on the expert knowledge prediction task compared to various text and graph analysis baselines.Acc, MaF and MiF denote accuracy, macro and micro-averaged F1-score.

Table 4 :
Ablation study of three training objectives.

Table 5 :
Model performance with different GNNs.We adopt gated R-GCN and achieves the best performance.Het.denotes whether it supports heterogeneous graphs.