Modeling Ideological Salience and Framing in Polarized Online Groups with Graph Neural Networks and Structured Sparsity

The increasing polarization of online political discourse calls for computational tools that automatically detect and monitor ideological divides in social media. We introduce a minimally supervised method that leverages the network structure of online discussion forums, specifically Reddit, to detect polarized concepts. We model polarization along the dimensions of salience and framing, drawing upon insights from moral psychology. Our architecture combines graph neural networks with structured sparsity learning and results in representations for concepts and subreddits that capture temporal ideological dynamics such as right-wing and left-wing radicalization.


Introduction
The polarization of online political discourse on platforms such as Twitter (Himelboim et al., 2013), Facebook (Bakshy et al., 2015), and Reddit  has received increasing attention in the computational social sciences recently, particularly in the context of Covid-19 (Green et al., 2020). In NLP, a growing body of work has discovered mechanisms by which polarization manifests itself linguistically (e.g., Demszky et al., 2019). However, the methods proposed so far rely on knowing in advance the political orientation of text, a requirement seldom met in social media.
In this paper, we propose SLAP4SLIP (Sparse LAnguage Properties for Social LInk Prediction), a novel framework that fully dispenses with the need for labels and instead leverages the ubiquitous network structure of online discussion forums to detect polarized concepts, making it more scalable and lightweight than previous methods. For example, SLAP4SLIP finds that fascist and mainstream are among the most polarized concepts in Reddit in 2019 ( Figure 1). We model the polarization of concepts along the dimensions of salience and framing. For framing, we take into account insights about the (a) fascist (salience) (b) mainstream (framing) Figure 1: Examples of concepts polarized along the dimensions of salience (a) and framing (b) in Reddit in 2019. Each circle is a subreddit. The values for salience (a) are relative concept frequencies. References to fascism, reflected by higher relative frequencies of fascist, are typical for left-wing subreddits (blue region). The values for framing (b) are contextualized BERT embeddings projected into the moral sanctity/degradation subspace. The framing of mainstream as degenerate is pronounced in right-wing subreddits (magenta region). We can diagnose such patterns using SLAP4SLIP in a minimally supervised way. moral foundations of ideology (Haidt and Joseph, 2004) and use contextualized BERT embeddings to construct subspaces that capture nuanced biases in the way concepts are discussed.
Contributions. We introduce SLAP4SLIP, a framework to detect polarized concepts without information about the political orientation of text. The specific model we propose for SLAP4SLIP combines graph neural networks with structured sparsity learning and identifies in a minimally supervised way (i) which concepts are the most polarized ones, (ii) whether the polarization is due to differences in salience or framing, and (iii) which moral foundations are involved (when framing is relevant). Drawing on English Reddit data, we evaluate the model intrinsically by conducting various experiments and extrinsically by using the found polarized concepts to predict the ideological leaning of US states. The model also learns subreddit embeddings that capture temporal dynamics. 1 Table 1: Overview of our key technical terms. See main text for more details.

Related Work
Our study is closely related to previous NLP work on polarization Demszky et al., 2019;Shen and Rosé, 2019;Roy and Goldwasser, 2020;Tyagi et al., 2020;Vorakitphan et al., 2020), but we try to avoid the need for explicit information about ideologies (e.g., manual labels) by leveraging the network structure of online discussion forums. Besides being more readily applicable in practice, this means our method is not restricted to a small number of opposing ideologies, making it theoretically more sound (Jackman, 2001). There is also work in the computational social sciences showing that the structure of various types of online social networks reflects polarization (Adamic and Glance, 2005;Garcia et al., 2015;Garimella et al., 2018), which has been explained as a result of homophily, i.e., nodes close to each other are likely to share similar views (McPherson et al., 2001). While these studies partition the network into a small number of ideological communities, our method does not require a discretization step. More broadly, our study is related to NLP work on ideology in general (Iyyer et al., 2014;Preotiuc-Pietro et al., 2017;Kulkarni et al., 2018).
Research in the political sciences has discovered salience and framing as two key dimensions along which the discussion of issues can vary ideologically. Salience refers to the amount of importance attached to an issue by individuals (Eulau, 1955;Miller et al., 2017). Mass media can impact salience, an effect called agenda setting (McCombs and Shaw, 1972). Framing refers to the mechanism by which certain aspects of an issue are highlighted (Entman, 1993;Druckman, 2001). Crucially, framing is different from sentiment: it reflects what considerations are perceived as important, not what stance is taken regarding these considerations (Nelson and Oxley, 1999). Both salience (with a focus on agenda setting) and framing have been the subject of previous work in NLP (Tsur et al., 2015;Card et al., 2016;Field et al., 2018;Mendelsohn et al., 2021). Here, we use them to characterize differences between online groups.
Psychological research has shown that the fundamental divisions between different ideologies are rooted in their views of morality (Lakoff, 2008). In moral foundations theory (Haidt and Joseph, 2004;Graham et al., 2011), this has been formalized as variation along the moral foundations of care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation. Several studies have shown that moral foundations theory is a suitable basis for analyzing ideological framing (Johnson and Goldwasser, 2018;Mokhberian et al., 2020;He et al., 2021). We follow this approach, but as opposed to prior work we operate with contextualized embeddings that we project into moral embedding subspaces.
Methodologically, we draw on advances in deep learning with graph neural networks, specifically graph auto-encoders Welling, 2016, 2017). In NLP, such graph-based architectures are increasingly used to include information from social networks for downstream tasks (e.g., Mishra et al., 2019;Hofmann et al., 2021). Our work differs in that we combine deep learning on graphs with structured sparsity, a form of regularization similar to 1 regularization (Tibshirani, 1996) that sets entire groups of parameters to zero (Alvarez and Salzmann, 2016). Structured sparsity has been used in NLP before (Eisenstein et al., 2011;Murray and Chiang, 2015;Dodge et al., 2019), but not in connection with graph neural networks.
The precise definition of the key technical terms in this paper somewhat varies in the literature (e.g., Bramson et al., 2016). Table 1 therefore provides a short overview of how we use these terms.
The key idea of this paper is to directly leverage the social network structure for determining polarized concepts. 2 We introduce a novel framework called SLAP4SLIP (Sparse LAnguage Properties for Social LInk Prediction) whose goal it is to model the structure of social networks in a data-driven way that obviates the need for extensive human annotation or partitioning the network into communities. SLAP4SLIP is a general framework to detect the most salient types of linguistic variablity in social networks and is in principle applicable in any scenario involving social networks with textual data attached to each node. In this paper, we show that for polarized online discussion forums, SLAP4SLIP can be used to find polarized concepts.
Let G = (V, E) be a network consisting of a set of nodes V representing social entities and a set of edges E representing connections between the social entities. We denote with A ∈ R |V|×|V| the adjacency matrix of G. Let C be a set of word n-grams denoting concepts (e.g., political issues like gun control). Here, we confine ourselves to subreddits for V and unigrams and bigrams for C, but SLAP4SLIP is applicable in other scenarios (e.g., for networks of people or concepts extracted from text in a more complex manner). We define a function ψ l : V × C → R that assigns to each node v i ∈ V and concept c j ∈ C the value of a linguistic property l observed for c j in v i . ψ l can be represented as a matrix in R |V|×|C| , where each column is a graph signal (Dong et al., 2020) over G determined by c j and ψ l . For example, if we chose l to be the frequency count, ψ l would indicate how often each concept occurred in the text attached to each node of the network. The goal of SLAP4SLIP is to find the subset of concepts C * ⊆ C that best meets the following two desiderata: (i) given a linguistic property l, the signals imposed on G by ψ l and the concepts in C * should allow for optimal predictions about the structure of G, specifically E; (ii) the number of concepts in C * should be minimal. 3 In practice, Figure 2: Example for the prediction of graph structure from a linguistic property. The figures show ψ l for concepts c 1 and c 2 on a toy graph, with l chosen to be the frequency count represented by node color (identical colors mean identical frequencies). The edges can be fully predicted from ψ l for c 1 but not c 2 .
we treat this as a constrained optimization problem (Bertsekas, 1982), i.e., we use (i) as the objective and impose (ii) as a hard constraint on |C * |.
As an example, consider the network in Figure 2. The network consists of two connected components of four edges each, with no edges between the components. C consists of the two concepts c 1 and c 2 . Taking the frequency count as linguistic property l and displaying it with the color of nodes, ψ l results in the two signals shown in Figure 2. We can see that the signal of concept c 1 alone allows for a perfect prediction of the network structure according to the decision rule Since c 2 cannot achieve a perfect prediction, C * = {c 1 } is the optimal solution. Notice the variance of ψ l (v i , c j ) is identical for both concepts and does not represent a good distinguishing factor. Notice also that the optimal solution is not necessarily unique: there might be another concept c 3 with a similar frequency count distribution as c 1 such that C * = {c 3 } would also be an optimal solution.

Model
We draw upon Reddit Politosphere (Hofmann et al., 2022), a pseudonymized dataset based on Reddit covering 605 political subreddits (e.g., politics) from 2008 to 2019. 4 For each year, Reddit Politosphere contains (i) all comments made to the subreddits and (ii) an unweighted graph with the subreddits as nodes and edges computed by applying statistical backboning to the counts of users shared between subreddits. Subreddits that have disproportionately many users in common are likely to be ideologically similar (Kumar et al., 2018). To ensure robust training, we only use years in which the graph has at least 100 nodes (2013 to 2019). See Appendix A.1 for summary statistics. The high modularity values indicate that the graphs are polarized (Kirkland, 2013). We propose a neural architecture that uses information about concept-level salience and framing to predict links between subreddits while reducing the number of considered concepts as far as possible. Since the links reflect ideological similarity, this should result in a compact set of concepts that is maximally informative about ideology. The performance on link prediction makes it straightforward to compare the quality of different models.
Determining concepts. To obtain the concepts C, we create for each year unigram and bigram vocabularies of political comments taken from Reddit Politosphere and non-political comments sampled in equal size from the default subreddits. 5 To eliminate unigrams and bigrams typical of discussions but not relevant to salience and framing (e.g., dont think), we only consider unigrams and bigrams that appear more often within than outside of noun phrases as detected by a noun phrase chunker (Honnibal et al., 2020). Based on their frequencies within the political and non-political comments, we compute mutual information scores for all unigrams and bigrams and take the top 1,000 unigrams and bigrams for C. This and all other steps are done separately for each year, i.e., we extract year-wise concepts and train year-wise models.
Modeling salience and framing. The first part of the architecture models ψ l , i.e., it extracts linguistic information related to salience and framing from the subreddits and maps them to scalar representations. In the resulting matrix Ψ l , each column is a signal on the entire graph defined by one concept, and each row is a vector for one subreddit defined by all concepts in C (Section 3).
To model ideological salience, we measure the relative frequency of concepts where n(v i , c j ) is the frequency count of concept c j in subreddit v i . Variations in the relative frequency of a concept that are strongly correlated with the 5 A set of topically diverse subreddits (e.g., Fitness) users used to be subscribed to automatically. We remove news and worldnews since they also contain political content. We retrieve the default subreddits from the Pushshift Reddit Dataset (Baumgartner et al., 2020). social network structure indicate that the concept is used with systematically higher frequency in certain regions of the social network, potentially caused by its elevated place within the ideologies of the subreddits in question.
To model ideologically-driven framing, we use BERT (base, uncased; Devlin et al., 2019) and obtain average contextualized embeddings e(v i , c j ) for each subreddit v i and concept c j . Furthermore, we use the Moral Foundations Dictionary (Frimer et al., 2017) and obtain for each moral foundation m k (e.g., authority/subversion) average contextualized embeddings for the 10 highest-ranked words of both poles. 6 Similar to Bolukbasi et al. (2016), we perform PCA on the 20 average contextualized embeddings for each m k and use the first principal component as the subspace representation e(m k ). This allows us to project the subreddit-specific average contextualized concept embeddings e(v i , c j ) into the five moral subspaces, reflects how relevant the moral foundation m k is for the contexts in which concept c j occurs in subreddit v i (see Appendix A.2 for further details and a systematic evaluation). The moral foundations are expected to be relevant for the framing of concepts to differing degrees. We therefore compute concept-specific weighted sums, are optimized during training.
Salience and framing can be of different importance for different concepts, i.e., there might be concepts with identical values of s(v i , c j ) across all subreddits but maximally polarized values of indicates the overall activation of concept c j in v i (i.e., both due to salience and framing). Two important points must be stressed. First, π (c j ) k and α (c j ) are specific for concepts but identical for subreddits: e.g., if a concept c j has α (c j ) = 1, this means that only information from s(v i , c j ) is used for all subreddits. Second, values for o(v i , c j ) are comparable across subreddits but not across concepts: since π (c j ) k and α (c j ) differ between concepts, differences in o(v i , c j ) are not meaningful for different concepts (see Section 5 for examples). To get the final concept representation that is passed to subsequent parts of the model, we set ψ l = o, i.e., each entry in Ψ l contains the value of o(v i , c j ) for subreddit v i and concept c j .
Graph neural network. To predict the links in G, we use a graph neural network (Wu et al., 2021), specifically a graph auto-encoder (Kipf and Welling, 2016), which takes as input the matrix Ψ l as well as G's adjacency matrix A.
The encoder consists of a two-layer graph convolutional network (Kipf and Welling, 2017). In each layer, the subreddit representations H (d) are updated according to the propagation rule whereÃ = A + I is G's adjacency matrix with added self-loops,D is the degree matrix ofÃ, and W (d) is the weight matrix of layer d. σ is the activation function, for which we use a rectified linear unit (Nair and Hinton, 2010) after the first and a linear activation (no non-linearity) after the second layer. We set H (0) = Ψ l . In our architecture, Z = H (2) is the output of the encoder. Graph convolutions are mathematically equivalent to Laplacian smoothing (Li et al., 2018), which is an important property for our architecture: if a concept does not occur in a subreddit, it ensures that the subreddit receives a high-quality representation by drawing on the neighboring subreddits.
In the decoder, we compute the reconstructed adjacency matrix,Â, according tô where we use the sigmoid for σ.Â is then used to compute a prediction loss, L (pred) .
Structured sparsity. Following the SLAP4SLIP framework, we want to reduce the number of concepts in C. In the described architecture, this amounts to reducing the number of columns in Ψ l . We want to achieve this as part of training, using structured sparsity learning, specifically group lasso regularization (Yuan and Lin, 2006), to set entire rows of the weight matrix W (0) to zero. Writing |C| ] as a series of row vectors, we define the regularization penalty as This is a mixed 1 / 2 regularization (the 1 norm of the row 2 norms) that leads to sparsity on the level of rows. When all entries in a row w (0) j are zero, this has the effect of removing concept c j from C. We compute the final loss as where λ > 0 is a hyperparameter controlling the intensity of the 1 / 2 regularization.

Experiments
Setup. For each year, we split E into 60% train, 20% dev, and 20% test edges. We always use the train edges for the adjacency matrix A that is passed to the model, i.e., only the to-be-predicted edges differ between train, dev, and test. For dev and test, we randomly sample non-edges (v i , v j ) ∈ E as negative examples such that edges and nonedges are balanced in both sets (50% positive, 50% negative). For training, we sample non-edges in every epoch (i.e., the set of sampled non-edges changes in every epoch). During test, we rank all edges according to their predicted scores. See Appendix A.3 for hyperparameter details.
In this paper, we use sparsity as a hard constraint on the number of concepts with non-zero row weights in W (0) , i.e., we only consider models for which |C| ≤ θ |C| , where θ |C| is the sparsity threshold. We initially set θ |C| = 150 but later analyze its impact in greater detail.
The model is trained with binary cross-entropy as L (pred) and Adam (Kingma and Ba, 2015) as the optimizer. Since L (reg) is non-differentiable, we use proximal gradient descent (Parikh and Boyd, 2013). We approximate the weighted proximal operator of the 1 / 2 norm using the Newton-Raphson algorithm (Deleu and Bengio, 2021). We use area under the curve (AUC) for model evaluation. We refer to our model as SF-SGAE (Salience/Framing Sparse Graph Auto-Encoder).
Intrinsic evaluation. We compare SF-SGAE against three ablated models: one where we use only salience, i.e., ψ l = s (S-SGAE), one where µ ± σ  we use only framing, i.e., ψ l = f (F-SGAE), and one where we use both types of information but replace the graph convolutions with linear layers (SF-SLAE). Furthermore, we implement a model that is identical to SF-SGAE but does not use sparsity, i.e., |C| is not reduced (SF-GAE).
SF-SGAE clearly-and substantially on some years-outperforms the ablated models (Table 2). This shows that jointly modeling salience and framing captures polarization better than only modeling one of the two. Between S-SGAE and F-SGAE, there is no clear winner, although F-SGAE performs slightly better overall. SF-SLAE performs substantially worse than all other models, which indicates that the Laplacian smoothing in the form of graph convolutions is a crucial component of the model. SF-SGAE also outperforms SF-GAE on test, suggesting that C * allows for a more robust generalization than the larger but noisier C.
How does the sparsity threshold θ |C| impact model performance? The answer to this question indicates how many concepts are required to capture the central ideological divides in the data. We vary 0 ≤ θ |C| ≤ 1000 and measure the performance (AUC) of the four sparsifying models on dev (Figure 3). First, we find that for the models using graph convolutions, reducing |C| to approximately 200 concepts does not hurt performance. For the model without graph convolution, on the other hand, performance starts to drop already around 400 concepts. This makes intuitive sense: given that the graph convolutions act as a form of smoothing, less concepts are needed for a reliable feature vector for each subreddit. Second, the advantage of SF-SGAE lies not only in its higher performance in the sparse regime but also in its ability to reduce |C| much further than any of the other models given a performance threshold. This again demonstrates that a joint model of salience and framing results in richer information, making it possible to reduce Extrinsic evaluation. The fact that SLAP4SLIP is a minimally supervised framework makes it challenging to evaluate the correctness of our model. While the performance on link prediction indicates how well C * captures the polarized structure of the social network, it is not a direct measure of ideological polarization. There is also no ground-truth dataset against which C * could be compared. We therefore devise an alternative extrinsic evaluation method. Specifically, we use DW-NOMINATE Rosenthal, 1985, 1997), a quantitative measure of the ideological polarization of members of the US Congress based on their roll-call voting behavior. Recently, a large dataset of DW-NOMINATE scores has been made publicly available (Lewis et al., 2021).
We first create a dataset with all comments from subreddits dedicated to US state-level politics (e.g., TexasPolitics) in 2018. 7 We discard subreddits with less than 250 comments, resulting in a set of 28 subreddits. For each state, we then compute the average DW-NOMINATE score of its representatives in the lower house of the 116th US Congress (elected in November 2018). The average DW-NOMINATE is a continuous measure of the ideological leaning of a state and ranges between −0.399 for Massachusetts (very liberal) and 0.467 for Idaho (very conservative). Notice that this score reflects the state-level voting shares to a certain extent (since it is averaged over the representatives elected by a state) while at the same time being more fine-grained (since representatives of the same party can differ ideologically). Finally, for each state-level subreddit v i , we extract s(v i , c j ) for (i) the d concepts c j from C * with the highest Figure 4: Performance on ideology prediction. The figure shows the distribution of accuracies for 100 models trained with relative frequencies of the concepts from C * versus the concepts from C \ C * . The concepts from C * result in overall much higher accuracies, indicating that they better capture ideological polarization. frequency across all state-level subreddits and (ii) d frequency-matched concepts c j sampled from C \ C * . 8 We set d = 5. 9 If the concepts from C * are better predictors of the average DW-NOMINATE scores than the concepts from C \ C * , this indicates that the model has learned a correct split into more versus less polarized concepts.
To test this empirically, we compute the absolute value of Pearson's r between s(v i , c j ) and the DW-NOMINATE scores. We find a higher correlation for the concepts from C * (µ = 0.285, σ = 0.062) than for the concepts from C \ C * (µ = 0.126, σ = 0.121), a difference that is shown to be significant (p < 0.05) by a two-tailed t-test. This indicates that the concepts in C * reflect the polarization of US politics better than the concepts in C \ C * .
Furthermore, we try whether it is possible to predict the DW-NOMINATE scores from the relative concept frequencies. Specifically, we binarize the DW-NOMINATE scores by dividing them into the upper and lower half, thus resulting in a balanced dataset of more conservative and more liberal subreddits. We then train 2 -regularized logistic regression classifiers using the relative frequencies of the concepts from C * and C \ C * as features. Since the dataset is small, we train 100 models on different random (label-stratified) splits of the subreddits into 50% training and 50% test. The models based on the concepts from C * have substantially higher accuracies (µ = 0.657, σ = 0.122) than the models based on the concepts from C \ C * (µ = 0.491, σ = 0.109), a difference that is again shown to be significant (p < 0.01) by a two-tailed t-test (Figure 4). We interpret this as further evidence that the concepts in C * (as opposed to the concepts in 8 For C * , we only consider concepts for which α (c j ) = 1, i.e., the polarization is captured by s(vi, cj) alone. 9 Results are robust with respect to the exact selection of d. Year women (c/h) lefties Table 3: Example concepts with α (cj ) values of 1, 0, and in between. For α (cj ) < 1, we also provide the moral foundation m k with maximum π (cj ) k . c/h: care/harm; f/c: fairness/cheating; l/b: loyalty/betrayal; a/s: authority/subversion; s/d: sanctity/degradation. aca stands for Affordable Care Act (also known as Obamacare). julian refers to Julian Assange. C \ C * ) capture ideological polarization.
Qualitative analysis. We analyze which concepts are selected by SF-SGAE (Table 3). Many concepts in C * are names of politicians (e.g., bush, donald) and designations of parties and political orientations (e.g., gop, lefties). Furthermore, C * contains concepts related to contested political issues. While many of these issues (e.g., gay marriage, gun control) have been shown to be characterized by polarized online discussions before (Lai et al., 2015;Demszky et al., 2019), others (e.g., deregulation, mainstream) have been in the focus to a lesser degree, highlighting SLAP4SLIP's potential as an exploratory framework.
The design of our model also allows us to analyze in what way the concepts are polarized. To do so, we first examine the weight distribution of α (c j ) for all c j ∈ C * . We notice that for the majority of concepts (roughly 80%) α (c j ) = 1, i.e., the model uses only information about salience. Concepts with α (c j ) = 1 tend to be of immediate relevance for certain ideologies, leading to higher frequencies in relevant network regions. For example, in communist subreddits, discussion often revolves around fascism as the central opposing ideology, leading to higher frequencies of fascist than in other parts of the network (Figure 1a).
For concepts with α (c j ) = 1, we can analyze which moral foundation has the largest π (c j ) k . This moral foundation constitutes the basis for intersubreddit differences in highlighting certain aspects of the concepts, which can be measured by |p k (v i , c j )|, i.e., the absolute value of the projection of the concept embedding onto the m k subspace. She's good at making progressive ideas sound like reasonable mainstream policies, which is the best of both worlds.
TheNewRight I think mainstream media has infected your brain with such rot that it effects your emotions. For example, within the sanctity/degradation subspace (the subspace with maximal π (c j ) k ), many subreddits frame the concept mainstream in neutral terms. Right-wing subreddits, on the other hand, frame it as something degenerate, particularly in the context of media (Figure 1b, Table 4), reflecting appeals to discredit mainstream media reporting of political news (Lee and Hosam, 2020).
To get a more global picture of which moral subspaces are most important for the polarized framing, we examine the learned values of π (c j ) k (Section 4) for all concepts with α (c j ) = 1. The three moral foundations that most frequently have the highest π (c j ) k value are loyalty/betrayal (30%), sanctity/degradation (27%), and authority/subversion (21%), followed by care/harm (18%) and fairness/cheating (3%). Interestingly, loyalty/betrayal, sanctity/degradation, and authority/subversion are the three moral foundations with the greatest democrat-republican differences (Haidt and Graham, 2007;Graham et al., 2009), indicating that the US two-party system is a central axis for the polarized framing of concepts on Reddit.
Ideological dynamics. The embeddings Z learned by our model are subreddit representations that combine linguistic information with network information. Here, we analyze what types of temporal ideological dynamics are captured by Z.
We map the embeddings Z for all years into a common embedding space using orthogonal Procrustes (Schönemann, 1966;Hamilton et al., 2016) and measure for each subreddit the cosine similarities between its embedding in the first year and its embeddings in all subsequent years. If the resulting time series of cosine similarities is continuously decreasing, this indicates a change in ideology. To detect such shifts automatically, we compute for each subreddit Pearson's r between the time series of years and the time series of cosine similarities. Examining the subreddits with the most extreme negative values of r, we observe that most of them experienced a pronounced shift in their ideological orientation ( Figure 5). Specifically, the subreddits move from a relatively moderate to a more extreme position in ideology space, either right-wing (e.g., FreeSpeech, POLITIC) or left-wing (e.g., Sino). This pattern suggests that the subreddits have ideologically radicalized over time (Grover and Mark, 2019;Youngblood, 2020).

Limitations
The success of our method depends on how accurately polarization is reflected by the network, which means that care must be taken during network selection (explicit networks) and construction (implicit networks). For example, user overlap on Reddit can also be due to conflict between subreddits (Datta et al., 2017;Kumar et al., 2018;Datta and Adar, 2019). While we do not find this to affect our results, it might be a limitation if the degree of homophily in the network is too low. This paper only applies SLAP4SLIP to networks with communities as nodes and edges based on user overlap between the communities. However, the kind of clusteredness our method draws upon has been shown to be a property of various types of social networks, including social networks with individual users as nodes such as Twitter (Conover et al., 2011;Himelboim et al., 2013). We expect SLAP4SLIP to be a suitable framework for finding polarized concepts in these cases, too. Figure 5: Example subreddits with a pronounced shift in their ideology over time. Orange: Sino, a subreddit originally devoted to geopolitics that moved to a more left-wing position; green and red: FreeSpeech and POLITIC, two originally moderate subreddits that moved to a more right-wing position.

Conclusion
We introduce SLAP4SLIP (Sparse LAnguage Properties for Social LInk Prediction), a novel framework for finding linguistic features maximally informative about the structure of a social network, and show that it can be used to detect polarized concepts. We model polarization along the dimensions of salience and framing. While we only address polarized concepts in this paper, the general nature of the framework makes it possible to apply it in diverse scenarios involving linguistic data attached to social networks (e.g., to find the most pronounced topical differences in citation networks). We see our study as an exciting first step towards bringing together computational social science research on online polarization, NLP work on political language, and graph-based deep learning.

Ethical Considerations
As part of our model, we use contextualized word embeddings to model the polarized framing of concepts. However, contextualized word embeddings are known to be biased (Basta et al., 2019;Zhao et al., 2019;Bender et al., 2021), which bears the risk of impacting our results. We see this as an important research question for future work.
The user base of Reddit has been shown to be disproportionately young and male compared to the general population of the US (Shatz, 2017). We acknowledge that this limits the generalizability of our results, and we try to be particularly careful when drawing conclusions in the paper. properties, which can be seen by examining the variation of p k (v i , c j ) across different concepts. Thus, computing 1 |V| v i ∈V p k (v i , c j ) for all concepts and moral foundations (i.e., the average value of p k (v i , c j ) across subreddits), we find that the lexical semantics of concepts with the highest values are directly related to the moral foundations (e.g., patriot and revolution for loyalty/betrayal).
On the other hand, p k (v i , c j ) also captures the association of concepts with moral foundations that is due to extrinsic cooccurrence patterns caused by ideological framing, which can be seen by examining the variation of p k (v i , c j ) across different contexts and subreddits (i.e., sets of contexts). To check this empirically, we use the 20 highestranked words per moral foundation from the Moral Foundations Dictionary (Frimer et al., 2017) and compute for each subreddit v i , concept c j , and moral foundation m k the proportion of occurrences in which at least one m k word is found in a context window of 10 words around c j , which is similar to traditional ways of measuring ideological framing (e.g., Fulgoni et al., 2016). We then create for each concept c j and moral foundation m k (i) a set T k (c j ) containing the d subreddits with the largest proportion of moral context words and (ii) a set B k (c j ) containing the d subreddits with the smallest proportion of moral context words. We set d = 5, but results are robust with respect to the exact selection of d. Comparing the average value of p k (v i , c j ) of subreddits in T k (c j ) and B k (c j ) for all concepts, we find it to be consistently higher for T k (c j ) than for B k (c j ) ( Table 6). The fact that this result holds for all years and moral foundations suggests that the extent to which the concepts cooccur with certain moral frames is indeed captured by the projections of contextualized embeddings into the moral subspaces. Crucially, while p k (v i , c j ) in principle captures both types of factors, only    Table 7: Dev performance (AUC). SF-SGAE outperforms S-SGAE, F-SGAE, and SF-SLAE. It performs similarly to or better than SF-GAE despite using only a fraction of concepts. Best score per column in gray.
the extrinsically-driven variation due to ideological framing is expected to be valuable for predicting the social network structure.
All experiments are performed on a GeForce GTX 1080 Ti GPU (11GB). The total number of trainable parameters is 107,110 for SF-SGAE, SF-SLAE, and SF-GAE, 101,110 for S-SGAE, and 106,110 for F-SGAE. Table 7 provides the dev performance for all models considered in Section 5 of the paper.

A.4 Dev Performance
A.5 Sparsity Threshold Figure 6 presents the results of the experiment varying the sparsity threshold described in Section 5 of the paper for all years.