Semantic shift in social networks

Just as the meaning of words is tied to the communities in which they are used, so too is semantic change. But how does lexical semantic change manifest differently across different communities? In this work, we investigate the relationship between community structure and semantic change in 45 communities from the social media website Reddit. We use distributional methods to quantify lexical semantic change and induce a social network on communities, based on interactions between members. We explore the relationship between semantic change and the clustering coefficient of a community’s social network graph, as well as community size and stability. While none of these factors are found to be significant on their own, we report a significant effect of their three-way interaction. We also report on significant word-level effects of frequency and change in frequency, which replicate previous findings.


Introduction
The mechanisms and patterns of semantic change have a long history of study in linguistics (e.g., Paul, 1886;Bloomfield, 1933;Blank, 1999). However, historical accounts of semantic change typically consider meaning at the language level and, as Clark (1996) points out, referring to Lewis's (1969) account of convention, the meaning of a word "does not hold for a word simpliciter, but for a word in a particular community". This gives rise questions of how semantic change manifests differently in different communities. In this work, we explore relationship between semantic change and several community characteristics, including social network structure.
Social network analysis has long been a tool of sociolinguists studying variation and change (e.g., Bloomfield, 1933;Milroy and Milroy, 1985;Eckert, 1988), but our work differs somewhat from that tradition in both methodology and focus. Sociolinguists typically work with the social networks of individuals-their ego networks-how many people each speaker is connected to, what kind of relationships they have and, sometimes, how people in their immediate network are connected to each other. The ego network is convenient for sociolinguists using ethnographic methods; it is usually infeasible to recreate the entire social network of a large community (Sharma and Dodsworth, 2020). By studying online communities, we are able to define and compute several community-level structural characteristics including size, stability, and social network clustering (Section 5).
Another way that our work differs from the variationist approach is that we consider change on the level of meaning. With a few exceptions (e.g., Hasan, 2009), sociolinguistic research studies variation in linguistic form (phonology, morphology and syntax). Indeed, mainstream sociolinguists have expressed skepticism that semantics can be a proper subject of variational analysis at all (Lavandera, 1978;Weiner and Labov, 1983), since the received definition of linguistic variation concerns multiple forms expressing the same content-i.e., different ways of saying the same thing. With semantics at the top of the traditional linguistic hierarchy, there is no higher-order constant to which two meanings can refer. In this work, we instead consider semantic shift, which refers to changes in the meaning of a given lexical form (Newman, 2015).
For more traditional sociolinguistic variables, social indexicality-the association of a variant with social identities and ideology-is the main factor that mediates diffusion (Eckert, 2019). Since semantic variation can itself carry social and idealogical meaning (Hasan, 2009), there is good reason to think that it may be sensitive to some of the same aspects of community structure.
The focus on semantic shift is also made possible by computational methodology-we model word meaning with distributional semantics (Section 4), which allows us to quantify short-term lexical semantic shifts at the community level.
In this study, we model the social networks of 45 English-language communities from the social media website Reddit, 1 and use diachronic word vectors to measure semantic change between two time periods one year apart. Then, we use a multistage linear mixed effects statistical model to test the effect of various community features on wordlevel semantic change.

Related work
In this section, we review work that uses computational methods to study linguistic variation and change in social context.
Distributional semantics Distributional methods, which model the meaning of a word with the contexts in which it appears, are a popular way to detect and quantify semantic change. 2 Several recent studies use distributional semantics to examine short-term semantic shift at the community level. Azarbonyad et al. (2017) use diachronic word vectors to study semantic change in political and media discourse, including in UK parliamentary debates, finding that word meaning changes differently depending on the political viewpoint of the speaker. Stewart et al. (2017) use diachronic word vectors to measure semantic change in the VKontakte social network during the Russia-Ukriaine crisis and find that changes in word frequency are predictive of semantic shift. Del Tredici et al. (2019) studied short-term semantic shift in the /r/LiverpoolFC community on Reddit, empirically validating the diachronic word vector model proposed by Kim et al. (2014) by correlating cosine distance between vectors from two different time periods with semantic change judgments collected from members of the community. In another study Del Tredici and Fernández (2017) find variations in word meaning across different Reddit communities, including communities organized around the same topic.
Social network analysis In an early example of using social network analysis to study the language online communities, Paolillo (1999) categorizes the relationships of users of an IRC channel as strong or weak ties, based on interaction frequency. They find that tie strength predicts the use of some online and community-specific forms but not others and conjecture that this difference is related the social meaning of those forms. Kooti et al. (2012) examined early Twitter conventions for attributing the source tweet to someone else (i.e., indicating that it is a retweet). They examined social network features, such as the size of a user's ego network, but did not find such features to be very predictive of convention adoption compared to global trends.
Communication games in a laboratory setting have also been used to examine the effect of social network structure on linguistic change. Raviv et al. (2019) quantified the communicative success, systematicity and stability of languages developed by "communities" of participants, but did not find a significant effect across the three different network structures that were tested. Lev-Ari (2018) found that individuals with larger real-world ego networks had less malleable semantic representations in the lab, and use computer simulations to argue that individuals with smaller ego networks therefore play an important role in the communitylevel propagation of linguistic change.

Data
To investigate semantic change in different communities, we use comments collected from the social media website Reddit. 3 On Reddit, users create posts, which consist of a link, image, or user-generated text, along with a comment section. Comments are threaded: users can comment on the post or reply to another user's comment.
Reddit is divided into forums called subreddits, which are typically organized around a topic of interest. While some forums-especially those organized around relatively niche topics-have a small tightly-knit community of users, others have a much looser community structure, with any given user posting and commenting infrequently.
Our dataset consists of comments from 45 randomly selected subreddits that were active in the years 2015-2017. In addition to the subreddit corpora, we created a generic Reddit corpus, consisting of comments sampled from every subreddit, including communities not in our sample. For both the generic corpus and the community-specific corpora, we constructed separate datasets for 2015 and 2017, leaving a one-year gap between them.
The generic corpus consists of 55M comments for 2015 and 54M for 2017. For each of the selected subreddits, we sampled comments from 2015 and 2017 to construct two datasets of 5.4M tokens each (averaging 158K comments). 4

Semantic change model
In this section, we describe how we quantify semantic change. We adopt a modeling procedure similar to that of Del Tredici et al. (2019), which is adapted from Kim et al. (2014)'s diachronic skip-gram with negative sampling (SGNS) model (Section 4.1). We define naïve cosine change for the communityspecific and "generic" lexicons (Section 4.2). In Section 4.3, we use a control procedure adapted from Dubossarsky and Weinshall (2017) to account for noise in the naïve metric.

Diachronic SGNS
The strategy laid out by Kim et al. (2014) is to train a standard skip-gram language model on a corpus from some time period t 0 , and then for each subsequent time period t n+1 , initialize a model with the same architecture with word vectors from time period In the following, will writew c,t for the word vector from M c,t , corresponding word w. 4 See Appendix A and B for details on community selection and data preprocessing.
Code for downloading the data and the running experiments can be found at https://github.com/GU-CLASP/ semantic-shift-in-social-networks.
5 It is not clear in the original paper if the tn+1 model is initialized with only the word vectors from the previous time period, or if internal weights and context vectors are included as well. It seems that most subsequent implementations only carry over the word vector weights, though, which allows for more flexibility with the vocabulary. We follow this approach.

Naïve cosine change
We define naïve cosine change as the angular distance between corresponding word vectors from the two different time periods. 6 For a community c, naïve cosine change is defined for all words in the vocabulary as follows: Generic naïve cosine change, cos G , is defined analogously.
Generally speaking, naïve cosine change has a strong track record as a semantic change metric, performing well in both human-annotated and synthetic evaluations (Hamilton et al., 2016b;Shoemark et al., 2019;Schlechtweg et al., 2020). Especially relevant to this work, Del Tredici et al. (2019) found cosine change to correlate with aggregated semantic change judgments collected from members of the /r/LiverpoolFC community on Reddit.
Model drift can distort cosine change, although this is mainly a problem with many serially-trained time periods (Shoemark et al., 2019). In a pilot study, we experimented with post-hoc aligned vector spaces and a neighborhood-based change metric (Hamilton et al., 2016a), but found minimal differences from the naïve metric.
A more serious concern for our purposes is the fact that naïve cosine change is inherently biased towards words that appear in more variable contexts. In the following section, we examine this issue more closely and define a rectified change metric that controls for noise. We discuss other limitations of the model in the final discussion section.

Rectified change score
Consider Figure 1 (left). Although naïve cosine change ranges a priori from 0 to 1, very few words score below 0.1. Even some of the most common function words have naïve cosine change above 0.2. Dubossarsky and Weinshall (2017) demonstrate that this bias is due to differences in the variance of different words' context distributions-if a word appears in highly variable contexts, the SGNS model is more likely to pick up on differences between time periods, even if those differences are mere happenstance and not reflective of actual change. This is especially a problem in our case where the amount of data is relatively small.
We adapt the shuffle control condition described by Dubossarsky and Weinshall (2017) to address this problem. For each subreddit, we shuffle the 2015 and 2017 corpora together and split them randomly to create pseudo-diachronic corpora with two "time periods". Then, we train diachronic SGNS models just as before, including initializing the "first" model with word vectors from M G,2015 . We do this n = 10 times for each community, giving us, for each sample i, and each vocabulary item w, a pseudo-naïve cosine change, cos c,i (w). Since no genuine change can possibly have taken place between the shuffled corpora, cos c,i (w) is a sample from the noise distribution that contributes to w's naïve cosine change, based purely on the nosiness of its context distribution in c.
Next, we take the mean,x c,w and sample standard deviation (using Bessel's correction of n 1 degrees of freedom), s c,w , of the samples and compute rectified change, which we define as the tstatistic of the genuine naïve cosine change, given the estimated noise distribution:The resulting metric, although it is still more variable for less frequent words, is unbiased by the variance of the underlying context distribution (Figure 1, right).
We perform this same procedure with the generic change models (shuffling together the generic 2015 and 2017 corpora) and define generic rectified change, ⇤ G , analogously.  Rectified change is a measure of how much higher (or lower) the measured naïve cosine change is than would be expected if the word's underlying context distribution hadn't changed at all. In other words, it quantifies the strength of the evidence that the word has changed. In our setup with 10 samples from the noise distribution, rectified change scores above 4.781 correspond to a 99.95% confidence that the change detected by the diachronic SGNS model was genuine. In addition to the analytical reasons for preferring rectified change and previous empirical work on historical change, we note that the highest scoring words for each community in our data are intuitively more varied and community-specific for rectified change. The naïve cosine change frequently ranks words with some kind of rhetorical or discourse connective function as the having changed the most (see Table 1 for examples).

Community features
In this section we characterize the structural features of the online communities in our dataset. Many of the features we define use the notion of active members. For a community c and time period t, the active members, U c,t , is the set of members who made at least 10 posts in that period.
Size The size of a community may have an effect on semantic change. In communication game experiments, Raviv et al. (2019) found that larger communities of participants developed linguistic structure faster and more consistently than when they were grouped in smaller communities.
We define community size, S 2015 = |U c,2015 |, as the number of active members in 2015.
Stability Community stability may also have an effect on semantic change. For example, communi- ties with stable membership have a better chance of building up community-specific common ground.
On the other hand, stable communities may experience less change if such change tends to come from new community members, as some studies have suggested (Danescu-Niculescu-Mizil et al., 2013). We define community stability as the Jaccard index between the sets of active members in 2015 and 2017. This metric, ranging from 0 to 1, captures how similar the community membership is between the two time periods.
Mean posts P 2015 is the average number of posts per active members over the course of 2015.

Social network model
In this section, we define our model of social network structure and a measure of network connectivity, which we consider along with the other community features. First, we give some background and motivation for including this feature. Social network connectivity can have seeminglycontradictory influences on linguistic change. Bloomfield (1933) observed that densely connected networks and strong social ties have a conservative influence on an individual's speech.
It is not clear whether this pattern will hold for semantic change since, as discussed by Sharma and Dodsworth (2020), different variables respond differently to different social network structures. We must also consider the evidence that an encounter with a novel or subtly unfamiliar word usage gives a speaker about the community's lexical common ground (Stalnaker, 2002;Clark, 1996). In more densely connected communities, such an exposure is better evidence that other speakers have been exposed to similar uses of the same word, either by the same speaker or, especially in the case of communities on social media, to the very same occurrence. For this reason, it could be that semantic change occurs faster in communities with dense clusters of strong social ties.
Clustering coefficient For each community, we define a graph model of its social network. For a, b 2 U c,2015 , let I(a, b) be the number of interactions between a and b in that community in 2015.
Interactions are considered undirected (regardless of who is replying to whom) and we don't consider self-replies, meaning that I(a, b) = I(b, a) and I(a, a) = 0. The two networks are thus defined: Note that we do not consider a top-level com-ment to be an interaction between the commenter and the creator of the post for two reasons: First, posts frequently do not contain any text written by the author-they are often just a link or photo. Second, the author of the post is not always the addressee of top-level comments, whereas in replies to comments, the author of the parent comment is always salient (though replies may of course be made with a wider audience in mind). The clustering coefficient (Watts and Strogatz, 1998), measures the graph's tendency to form dense, interconnected clusters of nodes. For an individual, i, the clustering coefficient C i is defined as the proportion of possible connections that exist between individuals connected to i in G: where N (i) = {j 2 U | {i, j} 2 G} is the neighborhood of i. The clustering coefficient for the community as a whole is the mean clustering coefficient of its members: Note that C i is precisely the measure of ego network density used in many sociolinguistic studies (Milroy, 1987), meaning that we would expect communities with higher clustering coefficients to exhibit less sociolinguistic change. We don't know whether the same effect holds for semantic change.

Predictive model
We perform an exploratory analysis of the data using multi-stage regressions and model selection by backwards elimination with semantic change, as measured by ⇤ , as the dependent variable. 7 Since we fit the mixed effects model at the word level, in addition to the community-level independent variables described in Section 5, we consider two word-level features as fixed effects. See Table 2 for the full list of fixed effects.
Word frequency Since word frequency known to interact with semantic change (Hamilton et al., 2016b). we include the frequency of the token in the 2015 community corpus (f 2015 ) as a feature. 7 The use of stepwise regression has been criticized for being a fallacious method for one-shot hypothesis testing but is a legitimate way to investigate the explanatory capacity of predictors. See https://dynamicecology.wordpress.com/ 2013/10/16/in-praise-of-exploratory-statistics/ for a discussion of the issue.
Change in frequency Additionally, we include the change in frequency between 2015 and 2017 (f = f 2017 f 2015 ) as a feature since previous work suggests that increases in the frequency of a word often accompany semantic change (Wijaya and Yeniterzi, 2011;Kulkarni et al., 2015;Del Tredici et al., 2019  Community intercepts In addition to fixed effects, we use community-level random intercepts under the hypothesis that community topics have idiosyncratic reasons or lexical reasons for differences in semantic change rates to do with the community topics themselves, which we do not model.

Detecting multicollinearity
Before fitting the full model with interactions, we checked for multicollinearity via linear regressions with the standard lm function in R as well as the variance inflation factor (VIF) calculation provided by the car package in R. All the predictors were scaled and centered (n = 201 240 word-community combinations). We found that the distribution of ⇤ is fat-tailed (it is likely tdistributed). Nevertheless, it is bell-shaped and large enough that this should not be a problem. We ran a regression under the hypothesis ⇤ ⇠ S 2015 + T + C + P 2015 + ⇤ G + f 2015 + f (see Table 2) and calculated the VIF on this model. We found that P 2015 had VIF higher than 2, the cutoff from Zuur et al. (2010). Removing it produced VIFs below the cutoff for the other predictors. 8 We fit a linear mixed effects model (using the lmer command from the lme4 package in R; Bates et al., 2015) with the remaining predictors in order to take into account the individual semantic change characteristics of community and word. (Model code and output will be placed on the web upon publication. ) We performed a regression on the model equation ⇤ c ⇠ (1|community) + S 2015 ⇤ T ⇤ C + ⇤ G ⇤ f 2015 ⇤ f ; that is, we included interactions among the community-level and word-level predictors.

Results
For the regression results (table 3), we do not report statistical significance directly from lmer. Instead, using R's anova function, we performed backwards elimination model selection (by stepwise removal of interactions and factors), and we report statistical significance based on p-values derived from the 2 log-likelihood ratio between models.
We found that all word-level fixed effects and their three-way interaction were significant at p < 0.05 in the model in terms of a 2 likelihood ratio test. The three-way word-level interaction ⇤ G · f 2015 · f had a p-value too small to represent ( 2 (4) = 6380.751) relative to a model with all predictors without the interaction (so terms ⇤ G + f 2015 + f ) along with all the other predictors and interactions. Relative to the model without the three-way word level interaction, removing each word-level predictor individually yielded p ⇤ G = 7.059 ⇥ 10 81 ( 2 (1) = 362.759), p f 2015 = 1.605 ⇥ 10 26 ( 2 (1) = 113.587), and p f was too small to measure ( 2 (2) = 2070.095).
The three-way interaction for the communitylevel features was significant at p = 0.014 ( 2 (4) = 12.530), but none of the two-way interactions or the individual predictors were significant. 9 We plotted the three-way interaction in Figure 3. Clustering coefficient and size are held at the mean and plus or minus one standard deviation from the mean. At low levels of clustering, all levels of size have a positive linear relationship on rectified change with respect to increasing stability.
At mean levels of clustering, the lower and mean levels of size retain the positive relationship but flatten out, and the high size level becomes negative. At one standard deviation above the mean for clustering, only the lowest size level remains positively sloped relative to stability. Confidence intervals increase dramatically as clustering increases (as there are fewer examples with higher coefficients). The effect of the random intercept is small ( 2 = 0.019, SD = 0.138). This is the extent to which the type of community causes the intercept of rectified change to vary.

Discussion and conclusions
We conducted an exploratory statistical analysis of the relationship between semantic change and several word-and community-level predictive features. Rectified semantic change, our independent variable, protects the results from certain systematic biases inherent in the traditional cosine change metric. By looking at online communities, we were able to compute a clustering coefficient on the social network graph of each community, as well as several other community-level structural features.
Community features and semantic change We found all three word-level features to be significant. Together with the intercept, f dominates the mixed-effects model, with greater changes in frequency associated with higher semantic change. This is in line with previous findings (Wijaya and Yeniterzi, 2011;Kulkarni et al., 2015;Del Tredici et al., 2019), but our study is the first to demonstrate this effect while controlling for noise effects.
Although the effect is much smaller, there is a negative relationship between semantic change and baseline frequency, f . This agrees with previous results about historical change (Hamilton et al., 2016a;Dubossarsky and Weinshall, 2017), but we note that, while we cannot compare the regression coefficients directly, it appears that frequency may have a much smaller effect on semantic change in the short-term setting; however, testing this hypothesis would require further research.
Semantic change in the generic lexicon also predicts community-level change, though it has a smaller effect than f . The interaction between f and G suggests that changes in frequency can predict whether generic lexicon changes in meaning will be picked up by a particular community.
We found that the three-way interaction between size, stability, and clustering, was significant: For communities with low clustering, there is a positive linear relationship between stability and semantic change (regardless of community size). For communities with average or high clustering, however, the positive relationship between stability and change only appears to hold for smaller communities. Note, however that the confidence intervals increase dramatically as clustering increases, since our sample of communities found fewer examples with high clustering.
We did not find significant correlations for any of the community-level features on their own. It is possible that a larger study with more communities or a more diverse set of communities would reveal some more universal effect, but we cannot make any conclusions from these results. The fact that the three-way interaction has a significant effect while none of the individual features did on their own demonstrates the complexity of relationship between structural community characteristics and semantic change.
Assumptions and limitations of the semantic change model In spite of our efforts to control for biases of cosine change, there are still some caveats when interpreting the results.
Like most distributional models of semantics, the diachronic SGNS model associates each word form with a single vector, meaning it is not sensitive to polysemy or homonymy. If a word with multiple senses undergoes changes in the relative frequency with which those senses are used, this would be reflected in the vector representation of the token that both senses are associated with, even if the meaning of either sense hasn't changed on its own. 10 However, many theories of semantic change emphasize the role of changing sense distributions as a mechanism for lexical semantic change, so it is not necessarily contrary to our aims of quantifying semantic change over the lexicon.
A related weakness of distributional semantics has to do with the distinction between meaningin-use and lexical meaning. Even if we assume that distributional context is a faithful (if noisy) representation of the situated meaning of a word (cf. Lücking et al., 2019;Bisk et al., 2020;Bender and Koller, 2020), it might not capture the word's full meaning potential (Norén and Linell, 2007)in the extreme, a word may have common ground semantic content that could be activated, but that happens not to appear in the corpus.
Moreover, changes in the topics discussed by the community may cause changes in the context distribution of words that don't reflect actual change in meaning. Consider the words at the top of the list for /r/toronto (Table 1). It's possible that some of those words appear due to changes in the sociopolitical topics people were discussing on the forum between 2015 and 2017. Similarly, the top word, 2016, presumably still refers to the same year, though the year itself went from being in the future to being in the past. Whether or not such a change counts as a change in meaning is naturally beyond the scope of this paper.
Future work This work offers some insight into how semantic change and community structure interact, but there are still many open questions, including how these results generalize to communities in different communicative settings and over different time frames. Future work should take a closer look at the kinds of change (e.g., Blank, 1999) taking place. For example, are the meanings of words broadening or narrowing? How are existing community-level communicative resources used to create new word uses? Given that we can identify statistically significant changes in meaning over a relatively short period of time, it would also be interesting to investigate the circumstances of individual changes. For example, do community members with more central social network position tend to innovate more? How are early innovative uses received by the community? Is there a correlation between semantic change in a given time period and the frequency of explicit word meaning negotiation (Myrendal, 2019) in the same period?

A Subreddit selection
We randomly selected 50 subreddits from the set of all forums with at least 15,000 comments per month for each of the 36 months in the 2015-2017 period. We initially selected 50 subreddits but excluded five from further analysis: two which were primarily non-English, two with particularly short average comment lengths, and one where our procedure for identifying template-generated posts failed (see Section B for details).

B Data preprocessing
Below we describe the preprocessing procedure we used to prepare training data for our diachronic SGNS models.
Duplicate comments Before any text normalization steps (described below), we sought to remove duplicate template-generated posts by bots and moderating tools. Since this automated content frequently appears in only one of the two time periods, it can have an outsized effect on the cosine change score of words included in the template.
We identified these posts by comparing the tail (after the first 50 characters) any two posts of more than 50 characters in length. Posts marked as duplicate under this criteria were discarded (keeping one such post in each category). This preserves "natural" human-written duplicates, which tend to be short, while catching most template-generated content, where form-filled deviations tend to be relegated to the beginning of the post. Unfortunately, this criteria missed posts by a bot in the /r/jailbreak subreddit, resulting rectified semantic change score outliers for certain words in the bot's template. As a result, we excluded this community from analysis in the mixed-effects model.

Normalization and tokenization
The text of comments was normalized as follows. We removed markdown formatting, extracting only rendered text. We exclude the content of block quotes, code blocks, and tables. We tokenized comments using the SpaCy tokenizer with the default English model (version 2.2.3). We lower-cased all tokens and removed whitespace, including linebreaks. Additionally, we removed tokens containing certain characters present in the 2015 data but absent in 2017, apparently due to text encoding changes made by Reddit. The removed characters were mostly emojis and certain Hangul graphemes and none were particularly common in our data (see [link] for a list of excluded characters).

C Vocabulary and SGNS training proceedure
For each community c we maintain a separate vocabulary, V c . Words with at least 50 occurances in both the 2015 and 2017 time periods are included in the vocabulary. Likewise, the generic Reddit models have vocabulary V G , which includes words with at least 500 occurances in both time periods. All models were trained with the Gensim (v. 3.8.1) SGNS implementation, with 200 dimensional vectors for 50 epochs (for both the generic and community-spcefic models). For all other hyperparameters, we maintain the default hyperparameters (length 5 context window, 5 negative samples per word, inital learning rate of 0.025, subsampling threshold of 1 ⇥ 10 5 , and negative sampling distribution exponent of 0.75).
For M c,2015 , we randomly initialize vectors for words in V c \ V G . Words in V G \ V c have no vector representation in M c,2015 or M c,2017 .