Align Voting Behavior with Public Statements for Legislator Representation Learning

Ideology of legislators is typically estimated by ideal point models from historical records of votes. It represents legislators and legislation as points in a latent space and shows promising results for modeling voting behavior. However, it fails to capture more specific attitudes of legislators toward emerging issues and is unable to model newly-elected legislators without voting histories. In order to mitigate these two problems, we explore to incorporate both voting behavior and public statements on Twitter to jointly model legislators. In addition, we propose a novel task, namely hashtag usage prediction to model the ideology of legislators on Twitter. In practice, we construct a heterogeneous graph for the legislative context and use relational graph neural networks to learn the representation of legislators with the guidance of historical records of their voting and hashtag usage. Experiment results indicate that our model yields significant improvements for the task of roll call vote prediction. Further analysis further demonstrates that legislator representation we learned captures nuances in statements.


Introduction
Modeling the behavior of legislators is one of the most important topics of quantitative political science. Existing researches largely rely on roll call data, i.e. historical voting records, to estimate the political preference of legislators. The most widely used approach for roll call data analysis is ideal point model (Clinton et al., 2004) that represents legislators and legislation as points in a one-dimension latent space. Researchers enhance ideal point model by incorporating textual information of legislation (Gerrish and Blei, 2011;Gu Kraft et al., 2016) and report positive results for roll call vote prediction.
Although roll call data is the major resource for legislator behavior modeling, it has two limitations. Firstly, it fails to uncover detailed opinions of legislators towards legislative issues. Therefore, we have no clue about the motivation behind their voting. Secondly, it is unable to model the behavior of newly-elected legislators because their historical voting records are not available (i.e., coldstart problem). Meanwhile, researchers explore to use public statements to characterize the ideology of legislators with the guidance of framing theory (Entman, 1993;Chong and Druckman, 2007;Baumer et al., 2015;Vafa et al., 2020). Vafa et al. (2020) propose a text-based ideal point model to analyze tweets of legislators independent of roll call data. Experiment results show some correlations between distributions of ideal points learned from legislative data and public statements. However, they treat the two resources separately and fail to uncover deep relationships of behavior between these two landscapes. Figure 1 shows a legislative issue related to prohibit partial-birth abortion. It includes the title and description of the legislation, roll call vote records and public statements on Twitter of legislators. Based on the voting records, we know the stance of legislators. With the discussion on Twitter, we can further understand their opinions towards the topic. Supporters concentrate on protecting the life while opponents emphasize rights of choice. This motivates that bridging public statements on Twitter with roll call data can provide a full image of behavior patterns of legislators.
A closer look at the example ( Figure 1) reveals that most tweets utilize hashtags to express ideas in short. Moreover, people with opposite stances choose different groups of hashtags, i.e., supporters use #life and #TheyFeelPain while opponents use #Choice and #WhatWomenWant. Further analysis on a large tweets dataset, where each tweet is processed by a python library TextBlob 1 , shows that most hashtags are polarized with one sentiment ( Figure 2a). Based on this observation and previous studies that reveal polarization of hashtags (Conover et al., 2011a;Garimella and Weber, 2017), we explore to utilize hashtags as a label to describe the preferences of legislators on public discussion and propose a novel task of hashtag usage prediction to characterize their ideology.
In this paper, we collect public statements of legislators on Twitter as an extension of roll call data for legislator representation learning. Our intuition is to combine roll call votes as hard labels and hashtags as soft labels to jointly model legislators. In practice, we build a heterogeneous graph to bridge the voting behavior and public statements of legislators. It consists of three kinds of nodes, legislators, legislation and hashtags in tweets. Subsequently, we employ a heterogeneous Relational Graph Convolutional Network (RGCN) (Schlichtkrull et al., 2018) to simultaneously update the representation of different nodes. Two tasks are used for training, including roll call vote prediction and hashtag usage prediction to model the behavior of legislators on voting and on public statements respectively. The major contributions of this paper are threefold: 1 https://github.com/sloria/TextBlob -To the best of our knowledge, this is the first study incorporating both voting behavior and public statements to jointly depict legislators. The proposed framework enables us to understand the preferences of legislators combining their behavior in legislative process and on public platforms. -We propose to learn the representation of legislation and legislators using heterogeneous graph which can densify relations among legislators, thus mitigate the cold-start problem. -We propose a novel task of hashtag usage prediction to characterize the preferences of legislators on public discussion and construct a dataset as the benchmark. Our dataset and code is available on Github 2 .

Dataset and Tasks
The Voteview website (Lewis et al., 2021) provides a benchmark for the task of roll call vote prediction. It contains roll call votes history and keeps updating. Meanwhile, a dataset constructed by Yang et al. (2020) enables the public to take advantage of detailed description and sponsor information of legislation from 1993 to 2018. We extend these corpora with tweets published by legislators.

Twitter Dataset
Since Twitter became popular among legislators in the last decade, we reserve 1,198,758 roll call records after 2009, involving 906 legislators and 3,210 pieces of legislation. For dataset construction, we first extract Twitter accounts of legislators from their homepages on the website of U.S. Congress 3 . For those who have not provided Twitter account, we manually search their names on Twitter, and identify their accounts by checking the verification information and biography. In this way, 735 accounts of legislators are included in our extended dataset. We crawl all tweets (before July 20th, 2020) for each legislator remained via twitterscraper 4 . In addition to this, we also collect their following list. We show some statistics of the dataset in Figure  2. Figure 2b presents the distribution of the amount of tweets posted by year. It shows that legislators pay increasing attention to Twitter from year 2009 to 2017. Legislators post 3,071 tweets on average and 57.82% of legislators post more than 2,000 times. In terms of hashtag, a third of tweets contain at least one hashtag with 82,381 unique hashtags in total. Figure 2c indicates that most hashtags fade away within three months. Figure 2d shows the distribution of the length of hashtags, illustrating a hashtag usually consists of a few words. In order to reduce noise, we keep hashtags with length greater than 2 and frequency higher than 50. After that, 2,057 hashtags are reserved for graph construction.
To explore hashtag usage behavior, we construct 0-1 labels indicating whether a legislator has posted a specific hashtag or not. Considering some hashtags are not popular, we further remove those posted by less than 100 legislators, for hashtag 3 www.congress.gov 4 https://github.com/bisguzar/twitter-scraper usage prediction. In this way, 194,040 labels are created.

Task Formulation
We introduce notations in this paper.
where each m i (i = 1, 2, ...) contains basic background information of legislators: member ID, state and party, accompanied with following list on Twitter.
-L = {l 1 , l 2 , ...} is the list of legislation, where each l i (i = 1, 2, ..) contains its title and description, as well as sponsor information and voting results.
-T = {t 1 , t 2 , ...} is the list of hashtags that have been mentioned by legislators on Twitter. Each of these hashtags contains information of related tweets and authors.
Note that each element (legislator, legislation or hashtag) is accompanied with the time when it appears in the context. We utilize these time markers to build our experimental environment to avoid future information leakage.
We use two tasks, i.e., roll call vote prediction and hashtag usage prediction to characterize the behavior of legislators in different landscapes, namely, Congress and Twitter. (1) Roll call vote prediction. This task aims to predict vote results of legislators towards legislation with stances of yea or nay.
(2) Hashtag usage prediction. This task aims to predict whether a legislator will post a given hashtag or not.

Proposed Framework
The overall framework we proposed is shown in Figure 3. We construct a heterogeneous graph with three kinds of nodes (legislation, legislator and hashtag) to cover the two landscapes of Congress and Twitter. On top of this graph, RGCN is applied to optimize the representation. This is achieved by a joint training of the two tasks of roll call vote prediction and hashtag usage prediction. In addition, we utilize an unsupervised following proximity loss to further optimize the representation.

Heterogeneous Graph Construction
The heterogeneous graph consists of three kinds of nodes and six types of relations with two categories (relations between homogeneous nodes and relations between heterogeneous nodes). We will introduce the structure of the graph in this subsection.

Initialization of Nodes
Legislator Nodes We follow Yang et al. (2020) to map each legislator to a continuous low-dimension vector, utilizing information of member ID, state and party. The legislator representation is X m = e ID ⊕ e P arty ⊕ e State Legislation Nodes For legislation, we pay attention to title and description and represent each legislation by sentence embedding generated by BERT (Devlin et al., 2019). Thus, the legislation representation is X l = BERT (title + description) Hashtag Nodes To represent a hashtag, we randomly choose K tweets with the tag and use BERT to get sentence embedding of each tweet text. After that, we take the average of these vectors, X t = Avg(BERT (tweet i )) i = 1, 2, ...K

Relations between Homogeneous Nodes
R1: Co-sponsorship of Legislators Each legislation is initialized by a sponsor and several cosponsors. Previous study (Yang et al., 2020) has proved the effectiveness of modeling cosponsorship in legislator representation learning. Obviously, more legislation two legislators have collaborated on means they are more alike ideologically. We follow this setup and regard the number of legislation two legislators have co-sponsored as weight of this relation to measure strength of the relationship between congressmen. In this way, a legislator network can be constructed and we obtain an adjacency matrix A, with each element a ij representing the number of legislation m i and m j have co-sponsored. R2: Similarity of Legislation Both topic models and embedding paradigms have been incorporated to model legislation in previous studies. However, the semantic relations among legislation have not been explicitly considered. We explore to better learn legislation representation by incorporating these semantic relationships. To achieve this goal, we construct a network of legislation, and use semantic similarity to link two legislation. Specifically, an adjacency matrix B is computed, with each element b ij denoting the number of common words in texts of legislation l i and l j . R3: Co-occurrence of Hashtags If two hashtags are mentioned together frequently, it's likely that they bear similar ideas, such as #dreamact and #protectdreamers. Therefore, we build a hashtag network, to help hashtag nodes learn from ones with similar ideology. An adjacency matrix C is constructed, with each element c ij indicating the number of co-occurrence of hashtag t i and t j .

R4: Relation between Legislator and Legislation
In the legislative process, each legislation is initialized by multiple legislators. Karimi et al. (2019) have indicated that features of the bipartite network of legislators and bills are informative. Therefore, we use such sponsorship relation to connect nodes of legislator and legislation. An adjacency matrix D is constructed, with each element d ij meaning whether legislator m i has sponsored legislation l j .

R5: Relation between Legislator and Hashtag
Legislators choose hashtags to use when they publish tweets. Therefore, we define an adjacency matrix F to measure preferences of legislators to hashtags. Each element f ij is computed as the times legislator m i has mentioned hashtag t j .

R6: Relation between Legislation and Hashtag
Legislation might discuss similar topics with hashtags used in tweets. We therefore align legislation with hashtags by computing the semantic similarity based on their textual information. To achieve this, an adjacency matrix G is constructed, with each element g ij representing the number of common words in the text of legislation l i and tweets with hashtag t j .

Relational Graph Convolutional Network
After initializing representation of legislator, legislation and hashtag, we feed them into Relational Graph Convolutional Network(RGCN) (Schlichtkrull et al., 2018) to update their representation based on the context. Graph convolutional networks (GCNs) (Kipf and Welling, 2017) provide an efficient way to perform message propagation and aggregation. In the propagation phase, nodes send signals to their neighbors while in the aggregation phase, each node sums up messages from its neighbors and updates its representation.
When there are only one type of relations, the layerwise rule of GCNs is: where H (l) is hidden representation of lth layer, A represents the adjusted adjacency matrix and W (l) is weight matrix shared by all edges in layer l, σ(·)represents the activation function. For each node i with neighbors N i , the update rule can be described as: where c i represents the normalization item, which is often set to |N i | when each neighbor has equal importance.
RGCNs generalize GCNs to deal with relations of different types. RGCNs utilize different weight matrixes and normalization factors for different relation types. Thus, the hidden representation for each node i in layer (l + 1) can be computed as: (3) where R is the set of relation types, and N r i is the set of neighbors of node i connected by relation type r. Since each neighbor has different degrees of importance in our graph, we compute the normalization factor c i,r according to weights of relations we have obtained, instead of using c i,r = |N r i |. We apply 2-layer RGCNs to capture 2 nd order relations between nodes empirically. After convolution, we get representations of legislator, legislation and hashtag, denoted as R m , R l and R t .

Model Training
We utilize two tasks, namely roll call vote prediction and hashtag usage prediction to train our model. In addition, we introduce a following proximity loss to further measure relationships of legislators based on their social networks.

Roll Call Vote Prediction
Given representation of legislators and legislation, the roll call vote prediction comes out to be a classification task. We conduct element-wise product and element-wise difference of embeddings of target legislator and legislation, and concatenate them to encode the relation. Then, we feed the relation representation into a feed-forward neural network (FFNN) with softmax to predict the result. Cross entropy loss is used: where y m,l,k is the k th one-hot class label of legislator m's vote on legislation l and f k indicates the k th component of the output of activation layer σ(·).

Hashtag Usage Prediction
Similar to roll call vote prediction, hashtag usage prediction is modeled as a relation prediction task. The representation of an edge is produced by embeddings of target legislator and hashtag. We then feed this representation to another FFNN with softmax. Cross entropy loss is used: where y m,t,k is the k th one-hot post label of legislator m's for hashtag t and g k indicates the k th component of the output of activation layer σ(·).

Following Proximity Loss
Previous studies (Barberá, 2015;Peng et al., 2016) have proved the effectiveness of using the following relationships on Twitter for political preference estimation, and show that users prefer to follow those with similar political positions. In order to incorporate this factor into consideration, we introduce a proximity loss (Hamilton et al., 2017;Nguyen et al., 2020) computed from a following network of legislators. It enables neighboring nodes to be represented more similarly and alienates representations of un-associated nodes. The proximity loss is formulated as follows: +Q · E mn∼Pn(m) log σ −e m e mn (6) where G is the subgraph of legislators formed by following relationships, and e m is the representation of a legislator m. m p is a neighbor of m that can be derived using fixed-length random walk, while m n is a negative sample that can be obtained through negative sampling m n ∼ P n (m) (Hamilton et al., 2017). Q controls the number of negative samples.
We form the final loss by linearly combining these three factors: L total = λ 1 L vote + λ 2 L hashtag + λ 3 L prox , where λ 1 , λ 2 and λ 3 are hyperparameters controlling the weight of different losses.

Experiment Setup
Dataset Splits Our experiment is based on data from the 112th to 115th congress, including both bills and resolutions from House and Senate. We use two configurations to form the experimental dataset.
(1) random: We set up an in-session experiment environment following Kornilova et al. (2018); Davoodi et al. (2020), where records of each two-year session is considered as an independent experiment set. This results in 4 experiment sets. For each set, 20% legislation is selected for testing, 20% is for validation and the rest is for training. (2) time-based: We set up a time-based environment following Yang et al. (2020). We form an experiment set with two consecutive sessions and use the former one for training and validation and the latter one for testing respectively. This results in 3 experiment sets. In this setting, some legislators might appear in the testing session only. Therefore, we report results of two settings. For Mem Train, we only include legislators appearing in training set for testing. For Mem All, we include all legislators in test set.

Implementation Details
The dimensions of initial legislative representations are 64, 768 and 768 for legislator, legislation and hashtag respectively. We randomly choose 50 tweets to encode each hashtag. When modeling relations, we set a threshold as the mean value for each type of relations, and only reserve those with weights greater than the threshold, to eliminate noise. We use 2-layer RGCNs and the sizes of hidden layers are 128 and 64. A batch normalization layer is added after initializing representation. The batch size is 128 and learning rate is 1 × 10 −4 . Dropout and early stopping strategies are adopted to prevent the model from over-fitting. For hyperparameters of three losses, we simply set λ 1 = λ 2 = 10λ 3 to control three losses within the same order of magnitude. For graph construction, the entity set covers all entities involved in and before that year while the relation set only covers information before that year to avoid future information leakage.

Models for Comparison
We compare our model with some state-of-the-art approaches.
majority is a baseline which assumes all legislators vote yea.
ideal-point-wf (Gerrish and Blei, 2011): a regression model that takes the word frequency of legislation text as features. The training paradigm follows the traditional ideal point model. Thus, it can only predict on legislators present in the training data.
ideal-point-tfidf : similar to ideal-point-wf, it uses TFIDF of legislation text as features instead.
ideal-vector (Kraft et al., 2016): it learns multidimensional ideal vectors for legislators based on bill texts.
-CNN+meta (Kornilova et al., 2018): on the basis of CNN, it adds percentage of sponsors of different parties as bill's authorship information.
-LSTM+GCN (Yang et al., 2020): it uses LSTM to encode legislation and applies a GCN to update representations of legislators.
-Vote: the single task of roll call vote in our framework.

Overall Performance
We report the average accuracy of all experiment sets following Kornilova et al. (2018); Yang et al. (2020). Besides, macro F1 score is also provided for more information.

Roll Call Vote Prediction
We have several findings for results of roll call vote prediction.
-Our model yields the best results. By utilizing hashtag usage information, our framework can further improve the performance on the basis of the single task Vote.
-Neural networks based approaches perform better than ideal-point based models. CNN+meta and LSTM+GCN achieve better results than other baselines. This proves that introducing background information is helpful to capture general preferences.
-All models perform worse in time-based setting compared to random setting. The performance drop of ideal-point based models that incorporate textual information is the largest. This indicates that ideal-point based models have difficulty for transfer learning from one session to another.
-Comparing the setting of Mem Train and Mem All, we find that most methods have difficulty modeling new-elected legislators. Models incorporating background knowledge perform more stable, among which our model is the most robust one.
Hashtag Usage Prediction For hashtag usage prediction, we evaluate our model in time-based setting. For comparison, we employ a simple FFNN to process initial embeddings of legislators and hashtags for label prediction. Experiment results show that our model achieves better performance than FFNN in terms of both accuracy (80.44% vs 80.03%) and macro F1 (61.34% vs 53.93%). This indicates that it's difficult to predict preferences on hashtags of legislators based on textual information only. Incorporating legislative information, our model achieves improvements, especially for macro F1. This also demonstrates that learning the voting behavior of legislators also benefits predicting what they will say.

Influence of Noise in Hashtag Set
Although most hashtags are polarized, there are still general ones like #America and #Trump . The usage of these hashtags is not able to stand for the stance. Therefore, the set of hashtags in our dataset contains noise. We conduct an additional experiment to explore the influence of noise brought by hashtags on the task of roll call prediction. We set a threshold to filter noise. Different thresholds indicate different degrees of polarization, where 0.5 means using all hashtag labels in our dataset (the setting of our model in Table 1), and 0.8 represents the ratio of major sentiment in tweets of the hashtag must exceed 0.8. Figure 4a presents the results. The performance increases when the threshold increases from 0.5 to 0.7, indicating hashtags without firm attitudes would hurt the performance. After that, the performance drops because of the reduction of data. However, due to the chance of hashtag hijacking strategy where a hashtag is deliberately taken up and used by "the other side" (Hadgu et al., 2013), noise in hashtags can not be completely eliminated in this way.

Further Analysis
We perform additional analysis to further evaluate the effectiveness of our model.

Cold Start Simulation
Since our model makes use of statements on Twitter to densify connections among legislators, we want to explore its ability to deal with the cold start problem. Although the settings of Mem Train and Mem All have shown the advantage of our model for newly-elected legislator modeling, we set up a more general environment. Here, we randomly mask a certain ratio of legislators, that is, discard their historical legislative information when constructing graph, to better investigate the model's ability to mitigate the cold start problem. Figure  4b illustrates the performance of our model when masking different ratios of legislators in time-based setting. When the ratio increases, performance stays stable and performs better than the best baseline LSTM+GCN consistently (87.01% of Acc. and 80.91% of MaF.). Thus, taking advantage of content generated by legislators, our proposed model shows good robustness.

Legislator Representation
We project learned representation of legislators into a 2D space using PCA. Figure 4c shows legislator representation of 115th congress based on data of 2018 learned by vote-based model, i.e., to train our framework without hashtag information. 4d shows that learned by the overall framework, where Democrats clearly fall into two clusters. An explanation can be given with a closer look at the relations between legislators and hashtags. While the left lower group behaves actively on Twitter, posting hashtags like #trumpcare, #goptaxscam and #protectourcare for multiple times, the other group rarely expresses their position by using these hashtags. While they vote similarly, this divergence can not be captured relying only on votes. Thus, our method indeed learns nuances between legislators.

Consistency of Statement and Behavior
We follow Hemphill et al. (2013) to investigate legislators' overall tweeting behavior and voting behavior by comparing hashtag usage and the first dimension of DW-NOMINATE (Lewis and Poole, 2004). We compute hashtag valence proposed by Conover et al. (2011a) and aggregate hashtags a legislator has posted to get hashtag valence for him or her. Since DW-NOMINATE scores are not comparable across chambers, Figure 5a and Figure 5b show conditions for legislators involved in the 115th session of House and Senate respectively. The figures and correlation(r(529) = 0.80 p < 0.001 for House and r(135) = 0.74 p < 0.001 for Senate) not only indicate that most legislators are polarized similarly in tweeting and voting, but also again illustrate that some legislators voting similarly on average can be hugely different in their languages. Complex similarities and differences of legislators like this can not be expressed by representation learned from votes or tweets separately. Besides overall leaning inference, inconsistency at the level of individual bills is also worthy of attention. When predicting on 113S2223, a bill for "an increase in the Federal minimum wage", the vote-based model predicts that Senator Harry Reid will vote nay, which is also the ground truth. But our model wrongly predicts that he will vote yea. We probe into his tweets and find that he used #raisethewage frequently to call for raise in minimum wage, as those who support the bill. On the one hand, hashtags may have difficulty capturing more fine-grained decisions, which can be influenced by various factors; on the other hand, legislators may behave differently from what they say, since they may make certain statements to get public support (Spell et al., 2020). When legislators do not accord their words to deed, our model may be misled by legislators' statements. As it's difficult to find hashtags directly and accurately related to a specific bill in an automatic and complete way, we will explore the frequency of inconsistency in the future.

Related Work
Ideal point estimation has become a mainstream approach to model ideology of legislators. Classical ideal point model (Clinton et al., 2004) represents both legislators and legislation in the same space, and voting behavior is characterized as the distance between them. However, this simple spatial model fails to predict votes on new legislation. Text-based models have emerged to address this issue. Blei (2011, 2012); Gu et al. (2014);Nguyen et al. (2015) extended ideal point model with latent topics and issue-adjusted methods. Some embedding methods (Kraft et al., 2016) also promote learning of legislators. More recently, external context information including party, sponsor and donors (Kornilova et al., 2018;Yang et al., 2020;Davoodi et al., 2020) have been introduced to better describe the legislative process.
Since votes are not the only way to express political preferences, other sources of data including speech and knowledge graph (Budhwar et al., 2018;Gentzkow et al., 2019;Patil et al., 2019;Vafa et al., 2020) have been applied to estimate ideology. Although previous studies (Bruns and Highfield, 2013;Golbeck and Hansen, 2014;Barberá, 2015;Peng et al., 2016;Wong et al., 2016;Boutyline and Willer, 2017;Johnson et al., 2017) have incorporated social network of following or retweeting on Twitter to learn legislators, fine-grained attitudes of legislators remain unknown since the texts themselves have not been mined. Until recently, Preoţiuc-Pietro et al. (2017) started to analyze linguistic differences between ideologically different groups using a broad range of handcrafted language features, and studies (Vafa et al., 2020;Spell et al., 2020) explored to incorporate Twitter texts to cap-ture nuances in legislators' preferences via statistical methods. In spite of this, there has been little research attempting to combine votes with public statements to portray legislators from both angles and predict their behavior.
Previous studies (Conover et al., 2011b;Small, 2011;Bruns and Stieglitz, 2012;Cohen and Ruths, 2013) have suggested that modeling on hashtag metadata is an informative way to analyze tweets, yielding classification of political affiliations. Since hashtag is an important mean for people to participate in political discussion and communication, hashtag usage pattern has also been modeled as feature vectors in many clustering tasks to help learn different user groups (Conover et al., 2011a;Bode et al., 2013Bode et al., , 2015. Hemphill et al. (2013) and Yang et al. (2016) have analyzed hashtag usage patterns of different ideologies through feature selection and keyword statistics. However, hashtag usage can be further utilized based on these analyses, e.g., for prediction tasks. Thus, we focus on hashtags to depict statements of legislators on Twitter, to jointly estimate their political preferences.

Conclusions and Future Work
In this paper, we take the first step to align voting behavior with statements on Twitter to jointly learn representation of legislators. We construct a heterogeneous graph to model the legislative context with a hashtag usage prediction task proposed to jointly train. Experiments demonstrate that our framework can learn effective legislative representation and yield improvements for the roll call vote prediction task. Due to the deficiency of background information, we have not yet detected more fine-grained stance of legislators towards specific events. In the future, we aim to conduct more research on the stance modeling of legislators.