A surprisal–duration trade-off across and within the world’s languages

While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that human cognition shapes how these languages evolve and are used. Assuming that the capacity to process information is roughly constant across human populations, we expect a surprisal–duration trade-off to arise both across and within languages. We analyse this trade-off using a corpus of 600 languages and, after controlling for several potential confounds, we find strong supporting evidence in both settings. Specifically, we find that, on average, phones are produced faster in languages where they are less surprising, and vice versa. Further, we confirm that more surprising phones are longer, on average, in 319 languages out of the 600. We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.


Introduction
During the course of human evolution, countless languages have evolved, each with unique features. Despite their stark differences, however, it is plausible that shared attributes in human cognition may have placed constraints on how each language is implemented. These constraints, in turn, may lead to compensations and trade-offs in the world's languages. For instance, if we assume a channel capacity (Shannon, 1948) in human's ability to process language (as posited by Frank and Jaeger, 2008), we may make predictions about these trade-offs. Additionally, if we assume this capacity to be uniform across human populations, these trade-offs will extend cross-linguistically.
Within languages, there is a direct connection between this channel capacity assumption and the uniform information density hypothesis (UID;Fenk and Fenk, 1980;Aylett and Turk, 2004;Levy and Jaeger, 2007), which predicts that speakers smooth the information rate in a linguistic signal so as to keep it roughly constant; by smoothing their information rate, natural languages can stay close to a (hypothetical) channel capacity. Across languages, a unified channel capacity allows us to derive a specific instantiation of the compensation hypothesis (Hockett, 1958), with information density (measured in, e.g., bits per phone) being compensated by utterance speed (in, e.g., milliseconds per phone). We may thus predict a trade-off between surprisal 1 and duration both within and across the world's languages.
This trade-off has been studied amply within high resource languages (Genzel and Charniak, 2002;Bell et al., 2003;Mahowald et al., 2018, inter alia). Cross-linguistically, however, this trade-off has received comparatively little attention, with a few notable exceptions such as Pellegrino et al. (2011) and Coupé et al. (2019). Several factors have inhibited cross-linguistic studies of this kind. Arguably, the most prominent is the sheer lack of data necessary to investigate the phenomenon. While massively cross-linguistic data abounds in the form of wordlists (Wichmann et al., 2020;Dellert et al., 2020), surprisal is a context-dependent measure and, therefore, isolated word types are not enough for this analysis. Further, as we have a specific hypothesis for why this trade-off should arise (humans' information processing capacity), we are not interested in simply finding any correlation between surprisal and duration. Several confounds could drive such a correlation, but most of these are either trivially true or uninteresting from our perspective. Therefore, a thorough analysis of this trade-off needs to control for these potential confounds.
In this work, we investigate the surprisalduration trade-off by analysing a massively multi-lingual dataset of more than 600 languages (Salesky et al., 2020). We present an experimental framework, controlling for several possible confounds, and evaluate the surprisal-duration trade-off at the phone level. We find evidence of a trade-off across languages: languages with more surprising phones compensate by making utterances longer. We also confirm mono-lingual trade-offs in 319 languages, out of 600; 2 within these languages, more surprising phones are pronounced with a significantly longer duration. This is the most representative evidence of the uniform information density hypothesis to date. Moreover, we did not find evidence of a single language where the opposite effect is in operation (i.e. where more informative phones are shorter). Given these collective results, we conclude there is strong evidence for a surprisal-duration trade-off both across and within the world's languages.

Surprisal and Duration
Cross-linguistic comparisons of information rate go back at least 50 years. In a study comparing phonemes per second, Osser and Peng (1964) found no statistical difference between the speech rate of English and Japanese native speakers. In a similar study, den Os (1985,1988) compared Dutch and Italian and found no difference in terms of syllables per second, although Italian was found to be somewhat slower in phones per second. Such cross-linguistic comparisons, however, are not straightforward, since the range of speech rate can vary widely within a single language, depending on sentence length (Fonagy and Magdics, 1960) and type of speech (e.g. storytelling vs interview; Kowal et al., 1983). In a meta-analysis of these studies, Roach (1998) concludes that carefully assembled speech databases would be necessary to answer this question. In this line, Pellegrino et al. (2011) recently analysed the speech rate of 8 languages using a semantically controlled corpus. They found strong evidence towards non-uniform speech rates across these languages.
This result is not surprising, however, given that natural languages vary widely in their phonology, morphology, and syntax. Despite these differences, researchers have hypothesised that there exist compensatory relationships between the complexity of these components (Hockett, 1958;Martinet, 1955). For instance, a larger phonemic diversity could be compensated by shorter words (Moran and Blasi, 2014; or a larger number of irregular inflected forms could lead to less complex morphological paradigms (Cotterell et al., 2019). Such a compensation can be thus seen as a type of balance, where languages compromise reliability versus effort in communication (Zipf, 1949;Martinet, 1962). One natural avenue for creating this balance would be a language's information rate. If this were kept roughly constant, the needs of both speakers (who prefer shorter utterances) and listeners (who value easier comprehension) could be accommodated. Speech rate would then be compensated by information density, resulting in a form of surprisal-duration trade-off. Indeed, Pellegrino et al. (2011) andCoupé et al. (2019) present initial evidence of this trade-off across languages.
Analogously, the UID hypothesis posits that, within a language, users balance the amount of information per linguistic unit with the duration of its utterance. This hypothesis has been used to explain a range of experimental data in psycholinguistics, including syntactic reduction (Levy and Jaeger, 2007) and contractions, such as are vs 're (Frank and Jaeger, 2008). While this theory is somewhat under-specified with respect to its causal mechanisms, as we argue in Meister et al. (2021), one of its typical interpretations is that users are maximising a communicative channel's capacity (Frank and Jaeger, 2008;Piantadosi et al., 2011). If we assume this channel's capacity to be constant across languages, we may derive a cross-linguistic version of UID. Such a hypothesis would predict, for instance, that speakers of languages with less informative phones will make them faster. Under this specific interpretation, our study can be seen as evidence of UID as a cross-linguistic phenomenon.

Measuring Surprisal
To formalise our approach, we first present a standard measure of information content: surprisal. In the context of natural language, surprisal (Hale, 2001) measures the Shannon information content a linguistic unit conveys in context, which can be measured as its negative log-probability: (1) In this equation, S is a sentence-level random variable, with instances s ∈ S * , and t indexes a position in the sentence. Accordingly, we define S as the set of phones in a given phonetic alphabet, and we use s <t to indicate the context in which phone s t appears.
Unfortunately, this surprisal is not readily available, since we would need access to the true distribution p(s t | s <t ) to compute it. We will use an approximation p θ (s t | s <t ) instead, i.e. a phone-level model with estimated parameters θ.
While much of the original psycholinguistic work on surprisal estimated p θ using n-gram models (Levy and Jaeger, 2007;Coupé et al., 2019, inter alia), recent work has shown that a language model's psychometric predictive power correlates directly with its quality, measured by its crossentropy in held-out data (Goodkind and Bicknell, 2018;Wilcox et al., 2020). We will thus make use of LSTMs in this work, since they have been shown to outperform n-grams on phone-level language modelling tasks . We first encode each phone s t into a high-dimensional lookup embedding e t ∈ R d 1 , where d 1 is its embedding size. We then process these embeddings using an LSTM (Hochreiter and Schmidhuber, 1997), which outputs contextualised hidden state vectors: where the initial hidden state h 0 is the zero vector and the initial phone s 0 is a start-of-sentence sym-bol. The hidden states are then linearly transformed and projected onto ∆ |S|+1 , the probability simplex, via a softmax to compute the desired distribution: 3 where W ∈ R (|S|+1)×d 2 and b ∈ R (|S|+1) are learnable parameters. We optimise the parameters by minimising our model's cross-entropy with a training set, which corresponds to minimising the following objective (4) where we assume {s (n) } N n=1 are sampled from the true distribution p(·).
To avoid overfitting to this training set, we then estimate the cross-entropy with a validation set {ŝ (m) } M m=1 , where we stop training once this validation cross-entropy stops decreasing. Note that minimising the cross-entropy is equivalent to minimising the Kullback-Leibler divergence between two distributions. Further, if we have access to a larger number of samples M , we assume this cross-entropy estimate will give us a tight approximation to the cross-entropy between p θ and the true distribution. Thus, the lower this cross-entropy, the closer we may assume our model is to the true p(·), and the better we should expect our surprisal estimates to be.
Hyper-parameter choices. We implement our phone-level LSTM language models with two hidden layers, an embedding size of 64 and a hidden size of 128. We further use a dropout of 0.5 and a batch size of 64. We train our phone-level LSTM models using AdamW (Loshchilov and Hutter, 2019) with its default hyper-parameters in PyTorch (Paszke et al., 2019). We evaluate our models on a validation set every 100 batches, stopping training when we see no improvement for five consecutive evaluations. We split each language's data (described in §4) into train-dev-test sets using an 80-10-10 split, using sentences as our delimiters. We thus do not separate phone data points from the same sentence. We use the first two splits to train and validate our models, while the test set is held out and used throughout our analysis.

Data
We use the VoxClamantis dataset for our analysis (Salesky et al., 2020). This dataset is derived from spoken readings of the Bible 4 and spans more than 600 languages from 70 language families, as shown in Fig. 2. 5 This dataset offers us a semantically controlled setting for our experiments, as it is composed of translations of a single text, the Bible.
This dataset contains automatically generated phone alignments and derived phonetic measures for all its languages (with both phone duration, and vowels' first and second formant frequencies). On average, there are approximately 9,000 utterances (or 20 hours of speech) per language, making it the largest dataset of its kind. Phone labels were generated using grapheme-to-phoneme (G2P) tools and time aligned using either multilingual acoustic models (Wiesner et al., 2019;Povey et al., 2011) or language-specific acoustic models (Black, 2019;Anumanchipalli et al., 2011). VoxClamantis offers its phonetic measurements under three G2P models, which trade-off language coverage and quality. We will focus on two: 6 • Epitran (Mortensen et al., 2018). This is a collection of high quality G2P models based on language-specific rules. Phonetic measurements produced with Epitran are available for a collection of 39 doculects 7 from 29 languages (as defined by ISO codes) in 8 language families.
• Unitran (Qian et al., 2010). This is a naïve and deterministic G2P model, but its derived measurements are available for all languages in VoxClamantis. While Unitran is particularly error-prone for languages with opaque orthographies (Salesky et al., 2020), we filter out the languages with lower-quality alignments (as we detail below). This original dataset has 690 doculects from 635 languages in 70 language families.
In order to study the trade-off hypothesis we require two measurements: phone durations and phone-level surprisals. As mentioned above, phone 4 These texts were crawled from bible.is and utterancealigned by Black (2019) for the CMU Wilderness dataset. 5 A list of all languages can be found in App. C. 6 We set Wikipron (Lee et al., 2020) alignments aside because we could not obtain word position information for them. 7 The term doculect refers to a dialect as recorded in a specific document, in this case a Bible reading. durations are readily available in VoxClamantis. Phone-level surprisals, on the other hand, are not, so we employed phone-level language models in order to estimate them (as detailed in §3). Given both these values, we can now perform our crosslinguistic analysis. First, though, we will describe some data quality checks.
Filtering Unitran. The phone and utterance alignments for the VoxClamantis dataset were automatically generated and may be noisy due to both of these processes. The labels from the Unitran G2P also contain inherent noise due to their deterministic nature. Accordingly, we filter the data using the mean Mel-Cepstral Distortion (MCD) as an implicit quality measure for the alignments. MCD is an edit-distance metric which evaluates the distance between some reference speech and speech synthesised using the alignments (Kubichek, 1993). We use the utterance-level MCD scores from the CMU Wilderness dataset (Black, 2019), removing all utterances with an MCD score higher than 7. This leaves us with 647 doculects from 600 languages in 69 language families.

Design Choices
There are several critical design choices that must be made when performing a cross-linguistic analysis of this nature. While some may at first seem inconsequential, they can have a large impact on down-stream results. Specifically, we assume that there is a surprisal-duration trade-off which is caused by a capacity to process information, which should be roughly constant across human populations. We must thus control for other potential sources for this trade-off, which we deem to be uninteresting in this work.
Phone-level Analysis. While there are good reasons for performing this analysis at the syllable-or word-level, we believe phones are advantageous for out study. Greenberg (1999), for instance, shows syllables are less prone than phones to be completely deleted in casual speech; syllables would thus allow more robust estimates of speech duration. Nonetheless, languages that allow for more complex (and long) syllabic structures will naturally have more valid syllables. A larger number of syllables, in turn, will cause each syllable to be less predictable on average. 8 Therefore, more complex syllables will be both longer and unpredictable. Studying syllables can thus lead to trivial trade-offs which mainly reflect the methodology employed. A similar argument can be made against word-level analyses. 9 Performing this type of analysis at the phone-level should alleviate this effect, making it the more appropriate choice.
Articulatory Costs. Whereas the range of effort used to produce individual phones may be smaller than in other linguistic hierarchies, there is still a considerable variation in the cost associated with each phone's articulation. For instance, Zipf (1935) argued that a phone's articulatory effort was related to its frequency. If this is indeed the case, a direct analysis of surprisal-duration pairs that does not control for articulatory effort could also lead to a trivial trade-off: the long and effortful phones will be less frequent and likely to be more unpredictable, having higher surprisals. To account for each phone's articulatory cost, we use mixed effects models in our analysis, and include phone identity as random intercept effects.

Word-initial and Word-final Lengthening.
There is ample evidence showing that, across languages, word-initial and word-final segments are lengthened during production (Fougeron and Keating, 1997;White et al., 2020). Another property is that word-initial positions carry more information than word-final ones, which has been well-studied in both psycholinguistics and information-theory. From a psycholiguistic perspective, it seems word-initial segments are more important for word recognition (Bagley, 1900;Fay and Cutler, 1977;Bruner and O'Dowd, 1958;Nooteboom, 1981). Under an information-theoretic analysis, it has been observed that earlier segments in a word are more surprising than later ones (van Son and Pols, 2003;King and Wedel, 2020;Pimentel et al., 2021). Word-initial segments are both lengthened and more surprising, potentially for unrelated reasons. An analysis which does not control for such word-positioning is thus doomed to find trivial correlations. To account for this word-initial and word-final lengthening, we include three word position fixed effects (initial, middle, or final) in our mixed effects models.
Sentential Context. The amount of context that a model conditions on when estimating probabilities will undoubtedly have an impact on a study of this nature. For example, a model that cannot look back beyond the current word, such as the one employed by Coupé et al. (2019), can by definition only condition on the previous phones in the same word. Arguably, a cognitively motivated surprisal-duration trade-off should estimate surprisal using a phone's entire sentential context and not only the prior context inside a specific word. In this work, we make use of LSTMs (as described in §3), which can model long context dependencies (Khandelwal et al., 2018).

Generalised Mixed Effects
Throughout our experiments, we will use mixedeffects models; we provide a brief introduction here (see Wood (2017) for a longer exposition). Classical linear regressions models can be written as: where y i is the target variable, x i ∈ R d is the model's input and φ ∈ R d a learned weight vector. Further, the error (or unexplained variance) term i is assumed to be normally distributed and independent and identically distributed (i.i.d.) across data instances. Such an i.i.d. assumption, however, may not hold. In our analysis, for instance, multiple phones come from each of our analysed languages; it is thus expected that such co-language phones share dependencies in how their i are sampled. Mixed-effects models allow us to model such dependencies through the use of random effects. Formally, for an instance x i from a specific language i , we model: where ω i is a random effect and φ is now termed a fixed effect. Here, ω i is an intercept term which is assumed to be shared across all instances of language i , and σ 2 ω is directly learned from the data. Similarly, we can add random slope effects: where each β i ∈ R d is a language-specific random slope and Σ β is a (learned) covariance matrix. Furthermore, our assumption that error terms are normally distributed may not hold in this setting. Phone durations, for instance, cannot be negative and are positively skewed, making a log-linear model more appropriate: where β i , ω i , and i are still distributed as in eq. (7). This is similar to modelling the original i terms as coming from a log-normal distribution. We note though, that under this model our effects become multiplicative (as opposed to additive): an increase of δ unit in the right side will make the value of y i be multiplied by e δ . We will use lme4's (Bates et al., 2015) notation to represent these models. Under this notation, a parenthesis represents a random effect and parameters are left out. We thus re-write eq. (8) as: 7 Experiments and Results 10 In this section, we will first analyse the surprisalduration trade-off in individual languages. We will then perform an analysis with our full data, studying the trade-off both within and across languages with a single model. Finally, in our last experiment we will average phone information per language to analyse a purely cross-linguistic trade-off.

Individual Language Analyses
We first analyse languages individually, verifying if more surprising phones have on average a longer duration. With this in mind, we estimate a generalised mixed effects model for each language. We control for each phone's articulatory costs by 10 Our code is available at https://github.com/ rycolab/surprisal-duration-tradeoff.  adding phone identity as a random effect. Additionally, we include fixed effects to control for word position effects, adding separate intercepts for wordinitial and word-final positions. Finally, we consider a fixed effect relating surprisal and word positions. At word-initial positions, for instance, the connection between surprisal and duration could potentially be stronger or weaker. 11 This leaves us with the following relationship: log(duration) = 1 + surprisal + position + surprisal · position + (1 | phone) (10) In this parametrisation, a trade-off between surprisal and duration will emerge as a positive and significant surprisal slope. Analogously, an inverse trade-off will emerge as a negative and significant slope, since we use two-tailed statistical tests. 12 Out of the 39 doculects in Epitran, 30 present statistically significant positive slopes ( 23 /29 languages, and 8 /8 families; meaning at least one language showed a significant effect per family). On Unitran (which we recall is a noisier dataset), 326 /647 doculects presented significantly positive slopes ( 319 /600 languages, and 53 /69 families). Additionally, we find no language in either dataset with significantly negative slopes: we either find evidence for the trade-off or we have no association whatsoever.
The trade-off strength, as measured by the surprisal-duration slopes, can be seen in Fig. 1 (on first page) and Fig. 3. As noted above, by 11 We analyse the impact of both these effects, phone identity and word position, in App. A. 12 Statistical significance was assessed under a confidence level of α < 0.01 and we used Benjamini and Hochberg (1995) corrections for multiple tests whenever necessary. predicting a linear change in logarithmic scale, our effects become multiplicative instead of additive. The average multiplicative slope we get across all the analysed languages in both datasets is roughly 1.02, meaning that each added bit of information multiplies duration by 1.02. We believe this should serve as strong support for our hypothesis of a trade-off within languages. Moreover, to the best of our knowledge, this is the most representative study of the UID hypothesis to date, as measured by the number and typological diversity of analysed languages.

Aggregated Cross-linguistic Analysis
Following the previous study, we now run a crosslinguistic analysis by aggregating all the languages within a single model. We add the same controls as before, but further nest the phone random effects per language (meaning we create one random effect per phone-language pair). We also include random language-specific intercepts and slopes. Formally, log(duration) = 1 + surprisal + position + surprisal · position After estimating this generalised mixed effects model, we find statistically significant crosslinguistic trade-off effects in both datasets. The multiplicative slope is roughly 1.02 in both datasets, again meaning each extra bit of information multiplies the duration by this value (φ = 1.023 in Unitran and φ = 1.015 in Epitran). 13 We further analyse the per-language trade-off slopes, which can be seen in Fig. 4. These languagespecific slopes are calculated by summing the fixed effect of the surprisal term with its random effects per language. We see a similar trend in this figure as in Fig. 3, with most of the analysed doculects having a positive surprisal-duration trade-off.

Cross-linguistic Trade-offs
Our previous experiment in §7.2 makes use of language-specific random effects. These effects allow the model to potentially represent withinlanguage trade-off effects, while correcting for 13 For the Epitran data, we performed this analysis while also adding language family effects and found similar results. However, we could not repeat this experiment for Unitran as the model was too memory intensive.  Figure 4: Language-specific trade-off slopes in Epitran from the mixed effects model in eq. (11). The y-axis represents a multiplicative effect, duration is multiplied by y per extra bit of phone information. cross-linguistic differences by using the model parameters. It therefore cannot serve as confirmation of a trade-off across the world's languages by itself, only as additional evidence for it. In this section, we do not use language-specific random effects; instead, we average surprisal within a language for each phone-position tuple. We then train the following mixed effects model: This equation is identical to the one in eq. (10), but now we model the language-phone-position tuples, instead of a language's individual phones. 14 Additionally, since we are aggregating results per tuple for this analysis, the central limit theorem tells us our model's residuals should be roughly Gaussian. We thus use linear mixed effects models, instead of the generalised log-linear ones. By analysing this model we find a significantly positive surprisal-duration additive slope, of φ = 1.5 milliseconds per bit (φ = 1.54 in Unitran and φ = 1.52 in Epitran). This confirms the expected cross-linguistic trade-off: languages with more surprising phones really have longer durations, even after controlling for word positions and phone-specific articulatory costs.

Discussion
The pressure towards a specific information rate (potentially set at a specific cognitive channel ca-pacity) has been posited as an invariant across languages. Directly testing such a claim is perhaps impossible, as data alone cannot prove its universality. Moreover, providing meaningful evidence towards this phenomenon requires a careful and comprehensive cross-linguistic analysis, which we attempt to perform in this work. In comparison to similar studies, such as those by Pellegrino et al. (2011) andCoupé et al. (2019), we employ more sophisticated techniques to measure a linguistic unit's (in our case, a phone's) information content. Moreover, we also employ more rigorous strategies for analysing the surprisal-duration relationship, controlling for several potential confounds. By introducing these improvements, we attain a more detailed understanding of the role of information in language production, both across and within languages.
Experimentally, we find that, after controlling for other artefacts, the information conveyed by a phone in context has a modest but significant relationship with phone duration. We see that this relationship is consistently positive across a number of investigated settings, despite being small in magnitude, meaning that more informative phones are on average longer. Additionally, using two-tailed tests at α < 0.01 throughout our experiments, we find no language with a significant negative relationship between phone surprisal and duration.
Limitations and Future Work. In this work, we implemented a careful evaluation protocol to study the relationship between a phone's surprisal and duration in a representative set of languages. To perform our study in such a large number of languages, however, we rely on the automatically aligned phone measurements from VoxClamantis, which contain noise from various sources. Future work could investigate if biases in the dataset generation protocol could impact our results. Further, VoxClamantis data is derived from readings of the Bible. Future studies could extend our analysis to other settings, such as conversational data.

Conclusion
In this work, we have provided the widest crosslinguistic investigation of phone surprisal and duration to date, covering 600 languages from over 60 language families spread across the globe. We confirm a surprisal-duration trade-off both across these analysed languages and within a subset of 319 of them, covering 53 language families. While there exist arguments against some of our design choices, our overarching conclusion is remarkably consistent across our analyses: the presence of a surprisal-duration trade-off is significant in language production. In other words, both across and within languages, phones carrying more information are longer, while phones carrying less information are produced faster.

A Confound analysis
In this section, we analyse the word position and articulatory cost confounds mentioned in §5, as they could have an impact on a surprisal-duration tradeoff analysis. We first investigate the parameters from our mixed effects models containing word positioning effects. The word-initial and word-final intercepts are significantly positive in all 39 languages of our mono-lingual Epitran analysis (represented by eq. (10)) and in both cross-linguistic experiments (eqs. (11) and (12)). The intercepts for word-initial positions average at 67 milliseconds, while the word-final ones average at 32, providing new evidence for this word boundary lengthening effect. Since word position is correlated with surprisal, this boundary lengthening phenomenon could pose as a source of bias in our results, had we not controlled for it.
We now explore the potential bias introduced by phone-specific articulatory costs. As mentioned in §5, languages with larger phonetic inventory sizes may be more inclined to use marked phones, which have longer duration. While this correlation between inventory size and unit cost would be particularly problematic for larger linguistic units (e.g. syllables) it can also affect our phone-level analysis. In fact, we take the Spearman correlation between a language's inventory size (in number of unique phones) and its average phone duration, finding a positive correlation of ρ = .28. The average surprisal-duration Spearman correlation across languages is ρ = 0.45. As inventory size and surprisal are strongly correlated across languages, we find that pure inventory effects may be driving a large part of the analysed correlation.
To analyse how strongly both confounds would reflect in the main effect if left unaccounted for, we rerun our previous analyses, but without effects for either position, phone, or both. We do so for Epitran only. The resulting estimated trade-off effects are given in Tab. 1. We indeed see that these confounds are typically absorbed by the fixed surprisal effect in all three settings. Notably, without confound control we would find supposedly significant results in all analysed languages, and a 10 times stronger cross-linguistic effect, all of which are in fact spurious.  Table 1: Comparison of trade-off (in milliseconds per bit) found when not conditioning on potential confounds. # Sign represents the number of significant languages (α < 0.01) in a mono-lingual analysis.