Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists

We present a cross-linguistic study of vowel harmony that aims to quantifies this phenomenon using data-driven computational modeling. Concretely, we define an information-theoretic measure of harmonicity based on the predictability of vowels in a natural language lexicon, which we estimate using phoneme-level language models (PLMs). Prior quantitative studies have heavily relied on inflected word-forms in the analysis on vowel harmony. On the contrary, we train our models using cross-linguistically comparable lemma forms with little or no inflection, which enables us to cover more under-studied languages. Training data for our PLMs consists of word lists offering a maximum of 1000 entries per language. Despite the fact that the data we employ are substantially smaller than previously used corpora, our experiments demonstrate the neural PLMs capture vowel harmony patterns in a set of languages that exhibit this phenomenon. Our work also demonstrates that word lists are a valuable resource for typological research, and offers new possibilities for future studies on low-resource, under-studied languages.


Introduction 1.Vowel Harmony
Many of the world's languages exhibit vowel harmony -a phonological co-occurrence constraint whereby vowels in polysyllabic words have to be members of the same natural class (Ohala, 1994).Natural classes of vowels are defined with respect to polar phonological features such as vowel backness (±BACK) and roundedness (±ROUND).In a prototypical language with backness, or ±BACK harmony, all vowels within a word tend to share the ±BACK feature, i.e. they are either all front (−BACK) or back (+BACK).Table 1 illustrates vowel harmony in Turkish, one of the languages best known to have this feature.In Table 1, the nominative plural and genitive plural are examples of −BACK harmony, while the genitive singular column of +BACK harmony.In the case of Turkish, vowel harmony can be defined as a constraint applying to almost all words and the entire inflectional system.In other languages vowel harmony may be restricted to the inflectional system, or even only a subset of inflectional suffixes.For example, In Estonian there are vestiges of vowel harmony in lexical items and it is absent from the inflectional system, while in Bislama it only occurs in a single suffix marking transivity (Crowley, 2014).Between these extremes of Turkish and Bislama lie languages such as Finnish and Hungarian, with intermediate vowel harmony systems where not all vowels participate in vowel harmony to the same extent.Both languages have ±BACK harmony, but a subset of the −BACK vowels allow +BACK harmony to spread: In a word like [lAtik:o] 'box ' (not [lAtik:ø]), +BACK harmony is not violated, whereas a word containing only neutral vowels triggers −BACK harmony, as in [merkitys] 'meaning' where the +BACK disharmonic form [merkitus] is not possible.
The rather broad application of the term has made it increasingly difficult to define it as a phonological process (cf.Anderson 1980).If vowel harmony is used as a typological feature to group languages into phylogenetic families, this broad application becomes perilous to the researcher since they have to be aware of the the degree of vowel harmonicity in the individual languages.Instead of searching for a necessarily complex definition of vowel harmony, research has consequentially concentrated on a quantitative description.

Prior Work and Scope
Prior approaches to a quantitative description of vowel harmony have mostly focused on strictly local harmony processes.Mayer et al. (2010) 1: Illustration of the Turkish vowel harmony system following Polgárdi (1999).The first vowel of a word form determines the harmony type.If the first vowel is +BACK, the vowels of the following suffixes must agree w. r. t. the +BACK feature.±ROUND harmony applies only in suffixes that have separate forms for this feature: The genitive suffix takes both ±BACK and ±ROUND forms, while the plural suffix varies only for ±BACK.
while Ozburn (2019) used count data to estimate succession probabilities and calculate the relative risk of encountering an harmonic vowel in a word form.These two approaches treated all positions in a word form identically.Goldsmith and Riggle (2012) argued that vowel harmony involves at least one type of non-local dependency, since it operates over consonants intervening between adjacent vowels.They employed a simple n-gram language model to learn the phonology of Finnish and calculated pointwise mutual information of vowel-vowel and consonant-vowel pairs based on the phoneme probabilities predicted by the language model, finding evidence for consonant-vowel harmony besides the expected ±BACK harmony, with a small bias towards +BACK harmony.However, n-gram language models are limited by their predefined context size.A language model with a left-hand context of n = 3 cannot capture the effect of vowel harmony if it operates over a neutral vowel intervening between two harmonic vowels.While this effect could be mitigated by allowing by allowing for a larger or flexible n, estimating probabilities from corpora becomes increasingly difficult with higher values of n.In this study we aim to improve over these methods by quantifying vowel harmony with a information-theoretic measure based on surprisal, capturing the relative strength of vowel harmony in language in terms of the likelihood of a vowel in a word to share a specific feature with preceding vowels.To do so, we employ neural recurrent language models with variable-length preceding phoneme context that are trained on cross-linguistically comparable lexical data.While some previous work on modeling vowel harmony with language models has been carried out (Rodd, 1997), finding evidence for Turkish vowel harmony in the hidden activations of a simple neural language model, it seems that this topic has not been further explored since then.In the following section, we first intro-duce feature surprisal as an information-theoretic measure of vowel harmony ( §2).We then present our computational experiments with the introduced measure of vowel harmony and discuss the results of their application to a large collection of crosslinguistic lexical data ( §3, §4).We conclude by discussing the implications of our study for future studies on vowel harmony in classical and computational studies ( §5).

Phoneme-Level Language Models
Preliminaries and Notations.
To quantify vowel harmony in our study, we make use of phoneme-level language models (PLMs).Consider a natural language with a lexicon L and a phoneme inventory Φ (using IPA symbols).Using a crosslinguistic word list, we obtain K samples from the lexicon D = {w k } K k=1 ∼ L where each sample is a word-form that is transcribed as a phoneme sequence w = (φ 1 , • • • , φ |w| ) ∈ Φ * .Given this sample of word-forms as training data, a PLM can be trained to estimate a probability distribution over Φ by maximizing the term Here, θ are the parameters of the model that are learned by maximizing the objective function above.Once a PLM has been trained, it can be used to compute the probability of unseen, heldout word-forms (i.e, word-forms that were not observed in the training data).Ideally, a PLM should assign a higher probability mass to plausible wordforms given the phonotactic rules of the language of the train data, and lower probability to implausible word-forms.

Recurrent PLMs.
Although different architectures can be used to build a PLM, we choose to employ a recurrent architecture based on unidirectional long short-term memory (LSTM) cell (Hochreiter and Schmidhuber, 1997).Given a word-form as a sequence of phonemes w = (φ 1 , • • • , φ |w| ), each phoneme is first projected into a continuous-vector phoneme representation using an embedding matrix as E(φ t ) = x t ∈ R d .Then, the LSTM takes as input the sequence at each position t within the word-form to compute the hidden state representation To obtain a probability distribution over the phoneme inventory, a linear transformation is applied on the hidden state vector followed by a softmax function to obtain a probability vector as Here, W ∈ R |Φ|×h is a projection matrix at the network output and b ∈ R |Φ| is a bias term.Nevertheless, we make a few (trivial) design modifications to the vanilla LSTM-based PLMs to make them more suitable for our study.First, since our main interest is to model the predictability of the vowels, we confine the output probability distribution to be over the set of vocalic segments, which is a subset of the phoneme inventory V ⊂ Φ.Second, we train and evaluate our PLMs to predict the next vowel only in the intra-word positions where we know that the next phoneme is indeed a vowel, given a preceding phoneme context that contains at least one vowel.While the output in this modified PLM is over the set V, the word-forms remain sequences in Φ * .That is, both consonants and vowels could appear in the preceding context.
Note that we do not employ fixed-length context n-gram PLMs in our study since we aim to account for non-local phoneme dependencies within a word-form.Given that word-forms within a lexicon have arbitrary lengths, restricting the preceding context to a fixed number of phonemes does not enable us to model vowel harmony across variablelength contexts beyond phoneme n-grams.On the other hand, we do not employ more powerful architectures such as a transformer (Vaswani et al., 2017) or a bidirectional LSTM (Graves and Schmidhuber, 2005) on grounds of suitability for the task: (1) the dependencies between vowels are relatively short (the domain of vowel harmony is the phonological word), (2) vowel harmony is a progressive phenomenon (i.e., operates from left to right-unlike its regressive counterpart umlaut), and (3) the training sets of the individual languages in our study are likely too small to train a large transformer model.Moreover, several prior studies within the information-theoretic approaches to investigate phonological structure have also employed LSTM-based PLMs (e.g., Pimentel et al., 2020Pimentel et al., , 2021a)).

Harmony as Surprisal
Given that our phoneme-level language model that was trained on a set of word-forms sampled from a natural language lexicon, we can quantify the vowel harmony phenomenon using Shannon's information content, or surprisal.Given a non-initial vocalic position t after a phoneme context φ <t , vowel surprisal is which is measured in bits.Note that surprisal is maximal when the preceding context tells us nothing about which vowels are more likely to occur.That is, if the vowels are sampled from a uniform distribution over the vowel inventory V, then η(v, t) = log 2 |V| (bits).Therefore, surprisal in our case is mainly a metric of how "predictable" a vowel is in a given context.Now consider a set of vowels H ∈ V that share a phonological feature.For a given vowel v ∈ H, we refer to the set H as a harmonic group, while its disharmonic counterpart ¬H ∈ V \ H as a disharmonic group with respect to the vowel v.For example, consider the front vowel [i] in Turkish that has the feature −BACK.With respect to [i], the front vowels in the Turkish vowel inventory [oe]} make a harmonic group since they all share the feature −BACK, while the rest of the vowels make a disharmonic group [o]} since they all lack the feature −BACK.Given a phoneme context that contains at least one vowel v such that v ∈ H, we compute the surprisal of a harmonic group at position t in a word-form by summing over the vowels in H, i.e.
We refer to the quantity η(H, t) as feature surprisal, since all members of the harmonic group Table 2: Languages from NorthEuraLex used in our sample along with their harmonic groups.Khalkha Mongolian has a special type of vowel harmony involving the placement of the tongue root: +ATR codes an advanced position of the tongue root in the vocal tract, while −ATR encodes an retracted or further back position.Languages in our sample that do not exhibit vowel harmony are marked with the symbol ( †).
H share one phonological feature.Likewise, we compute the surprisal of a disharnomic group by summing over the vowels in ¬H as Assuming that a PLM has learned the vowel harmony constraints of a language from the training word-forms, we expect the model to predict that vowels in H are more likely to co-occur in a single word-form.By implication, we expect the model to "disfavour" the occurrence of a vowel in ¬H when observing members of H in the context.That is, in a language that exhibits this linguistic phenomenon, word-forms that conform to vowel harmony should be assigned a higher probability than word-forms that do not.For example, the Finnish word form [s i l m ae s ae] is expected to be assigned a high probability by our model since the sequence of vowels [i], [ae], [ae] is −BACK harmonic, and its disharmonic counterpart [s i l m ae s o] is expected to be assigned a lower probability.Note in equations ( 5) and ( 6) we compute the surprisal at a single vocalic position in a given wordform.To quantify harmonic group surprisal across a set of held-out word-forms W, we compute the quantity which is the average feature surprisal.Here, the outer sum w∈W iterates over all word-forms in W, while the inner sum t∈{τ,••• ,T } iterates over non-initial vocalic positions within the word-form w.The feature surprisal of a disharmonic group η(¬H) is computed in the same way as in equation ( 7) but summing over the term η(¬H, t) instead.
Finally, we quantify the strength of a vowel harmony constraint in a language as the difference of feature surprisal of the harmonic and disharmonic vowels If feature surprisal in harmonic phoneme sequences is lower than feature surprisal in disharmonic phoneme sequences, ∆ η is negative, indicating that harmonic sequences are assigned higher probability.It is worth pointing out that our grouping of the vowels into harmonic groups is only used to obtain feature surprisal values from the model after it has been trained.That is, our PLMs for all languages in our study are trained without an explicit signal that informs the model about the features of the vowels.
3 Experimental Data and Setup

Data
Previous research has made use of large corpora of inflected word-forms (Goldsmith and Riggle, 2012) or running text (Mayer et al., 2010) to infer vowel harmony patterns.This is mainly because vowel harmony constraints often surface in inflectional suffixes, especially in highly agglutinating languages such as Finnish, Hungarian or Turkish.Though this approach is not in itself problematic, it relies on data that may not exist for the majority of the world's languages.It is also not applicable for languages that have a different grammatical structure, for example, reduced or fusional morphology.
On the other hand, if a language has vowel harmony as a phonologically conditioned rather than a purely grammatical phenomenon, the relevant vowel harmony patterns should also be recoverable from lexical data with little or no inflection at all.We use parts of the NorthEuraLex database (http://www.northeuralex.org/,Dellert et  2020) as experimental data to train our phoneme language models and quantify the effect of vowel harmony in languages that are known to exhibit this linguistic phenomenon.NorthEuraLex offers a large multilingual word list consisting of 1005 concepts translated into 107 language varieties from North Eurasia with translations provided in a unified transcription following the International Phonetic Alphabet (IPA).Moreover, NorthEuraLex contains a larger number of diverse language varieties from various language families that are known to exhibit vowel harmony, as well as language varieties that are known to lack the phenomenon.
As there is no clear definition of what constitutes vowel harmony in languages, and linguistic resources such as the World Atlas of Language Structures (Dryer et al., 2014) do not provide this information, we concentrate on a subset of 10 language varieties from NorthEuraLex, with five varieties traditionally known to exhibit vowel harmony, and five known to not exhibit the phenomenon.When selecting the languages, we tried to obtain a rather diverse sample of languages from different language families.Table 2 gives an overview over the languages and their active harmony processes (where present).
The NorthEuraLex data is available in the form of Cross-Linguistic Data Formats (CLDF https: //cldf.clld.org,Forkel et al. 2018), following the recommendations underlying Lexibank (List et al., 2022a), a large collection of lexical word lists (https://github.com/lexibank/northeuralex).A core feature of CLDF is the integration of reference catalogs.Reference catalogs are metadata collections that offer basic information on major linguistic constructs, such as languages (Glottolog, https://glottolog.org,Hammarström et al. 2022) or concepts (Concepticon, https://concepticon.clld.org,List et al. 2022b).In addition to offering word lists standardized with respect to language names and concept elicitation glosses, Lexibank offers standardized phonetic transcriptions as specified by Cross-Linguistic Transcription Systems (CLTS, https: //clts.clld.org,List et al. 2021), a reference catalog that offers a transcription system that conforms to the IPA but resolves ambiguities encountered in the original IPA specification (Anderson et al., 2018).
Since NorthEuraLex is available in CLDF, this means that we have direct access to standardized phonetic transcriptions segmented into individual sounds in each word form along with an underlying set of distinctive features provided by CLTS.The resulting data set provides on average 1136 unique word-forms per language (with several concepts having two or more word-forms as translational equivalents), with larger differences between individual languages.We decided against downsampling word lists to a common size due to the already small number of samples.The word list sizes range from 971 (Ainu) to 1513 (Manchu).

Preprocessing
For each of the languages, identical word-forms are collapsed to a single item, such that each sequence of phonemes is presented only once to the model.In addition, word-forms which are a substring of another word form are also ignored.Thus, if the word list of a language contains the sequences { [s i l m ae], [s i l m ae], [s i l m ae s: ae], [s i l m ae d ae] }, only the latter two sequences are kept: { [s i l m ae s: ae], [s i l m ae d ae] }.This procedure ensures that only unique sequences are presented to the model, and that train and test splits do not contain identical forms, which might otherwise lead to unjustified higher weights for sound sequences recurring across the vocabulary of individual language varieties.

Training
For each language, we randomly split the data into 60%, 10% and 30% subsets for train, validation and test splits respectively.The models were trained with the Adam optimizer (Kingma and Ba, 2015) on the task of minimizing the cross entropy of the predicted distribution and the true probability distributions over the vowel inventory.This is equivalent to minimizing the negative log-likelihood of the true phoneme at each position.25% of the in-puts were randomly replaced by a mask token to prevent overfitting on the relatively small sample.Note that the output probability distribution of the model is restricted to the vowel inventory of the language plus the end-of-sequence token, since only the vowel positions are of interest for the analysis.
A separate model was trained for each language in our subset of 10 languages from NorthEuraLex.The same hyperparameters were used for training as in Pimentel et al. (2021b), with batch size reduced to 32 since NorthEuraLex wordlists are considerably smaller than the datasets used in that paper.Table 4 in Appendix A shows the exact configuration of the hyperparameters.After each epoch the models were evaluated on a validation set, and all models were trained until validation loss converged.Training the models on unique sequences derived from word lists ensures that the model sees each sequence only once per epoch, and minimizes overlaps between train, test and validation set.

Significance Tests
As the expected behavior of vowel harmony languages is that the vowels are not evenly distributed over their words, average feature surprisal is likely to not be normally distributed.The Shapiro-Wilk test (Shapiro and Wilk, 1965) was used to check whether the surprisal values for every comparison.For every pairing of conditions at least one of them was not normally distributed with p < 0.01.Thus, the Wilcoxon signed-rank test was conducted to test the significance of a paired contrast (as in the example above).Effect size was calculated as the rankbiserial coefficient using the common language effect size f = U n 1 •n 2 as r = f − (1 − f ), with U being the test statistic and n 1 • n 2 being the number of possible comparisons between two conditions.For an unpaired contrast (e.g. the contrast between average feature surprisal for +ROUND after a −ROUND vowel and average feature surprisal for +BACK after a −BACK vowel) a Mann-Whitney U-test was conducted, with effect size calculated as the rankbiserial coefficient using the T statistic and the sum of ranks S as r = T S .All significance tests were conducted using the SciPy Python package (Virtanen et al., 2020).

Implementation
The methods described here are implemented in Python.The PyTorch library (Paszke et al., 2019) is used to train and evaluate our neural models.
CLDF data are accessed with the help of CL Toolkit (https://pypi.org/project/cltoolkit, List and Forkel 2021), a Python package that provides convenient access to lexical word lists in CLDF.
4 Experimental Results

Feature Surprisal
All vowel harmony languages show significant differences in feature surprisal between harmonic and disharmonic conditions with negative ∆ η ; individual results can be retrieved from the result tables 6-10 in Appendix C. Feature surprisal in the +BACK disharmonic condition was found to be higher than feature surprisal in the −BACK disharmonic condition for Finnish (∆ η = −0.2148,p < 0.01), Hungarian (∆ η = −1.0806,p < 0.01) and Turkish (∆ η = −0.8602,p < 0.01), which confirms the findings of Goldsmith (1985).Note that if the +BACK and −BACK harmony were equally strong, one would expect no difference in surprisal if the harmony is violated.Three out of four languages with ±BACK harmony show this tendency, indicating that the relative strength of +BACK harmony over −BACK harmony is the usual case rather than an exception.A possible explanation for this difference in strength is the existence of neutral vowels, with 3 of the 4 ±BACK harmony languages in our sample having at least one neutral vowel, and Turkish, the only language without neutral vowels, also showing the largest difference between the two disharmonic conditions .The probabilities of the neutral vowels are not included in the feature surprisal calculation, causing feature surprisal to be higher in the +BACK disharmonic condition while lowering feature surprisal in the −BACK disharmonic condition.For Hungarian feature surprisal was lowest in the neutral harmonic condition, meaning that neutral vowels are most likely to occur after another neutral vowel.Even though Hungarian neutral vowels trigger −BACK harmony, the low number of forms containing both −BACK vowels and neutral vowels makes it difficult for the neural language model to learn the pattern, leading to the highest feature surprisal occurring in the harmonic condition (i.e. for the −BACK feature).Figure 1 gives an overview of the relative strength of vowel harmony for all languages and harmonic features in the sample used in this study.For this figure the sign of ∆ η was reversed in order to quantify the reduction of feature surprisal in the harmonic sequences as compared to the dishar- monic sequences for each combination of feature and language.The boxplots of languages without vowel harmony are located towards the left of the plot with small differences between harmonic and disharmonic sequences, with some vowel harmony languages showing similar, yet still positive surprisal reduction (e.g.Finnish +BACK vowels, Hungarian +BACK vowels)

The Case of Turkish
For Turkish the difference in feature surprisal between harmonic and disharmonic conditions was large.Figure 2 shows that for both the ±BACK and ±ROUND conditions, the disharmonic condition displays a much higher surprisal value as compared to the harmonic condition (∆ η = −3.6816,p < 0.01 and ∆ η = −2.7061,p < 0.01 re-Figure 2: Feature surprisal for Turkish back harmonic/disharmonic sequences (left) and round harmonic/disharmonic sequences (right).The difference between harmonic and disharmonic conditions is significant with p < 0.01 in both cases.**:p < 0.01, *: p < 0.05, ns: p > 0.05 spectively).A small but significant bias towards +BACK harmony was detected (∆ η = −0.8602,p < 0.01).There is one obvious reason for the relative strength of ±BACK harmony over ±ROUND, namely the parasitic nature of ±ROUND harmony in Turkish: while all morphemes have different forms for ±BACK, allowing for ±ROUND disharmony, only a subset also has separate forms for ±ROUND (Tab.1).Thus, there are more instances of ±BACK harmony to be observed by the model, and this is expected to result in higher surprisal values for the ±BACK disharmonic conditions.After ±ROUND vowels feature surprisal was also much higher in the disharmonic conditions, with feature surprisal in the round disharmonic condition being higher than in the unrounded disharmonic condition (∆ η = −1.5827,p < 0.01).In other words, +ROUND harmony seems to be stronger than −ROUND harmony in Turkish.When combining the disharmonic conditions within a harmonic feature and comparing them to the disharmonic conditions in the other harmonic feature, the combined back disharmonic condition (both front disharmonic and back disharmonic) yields slightly higher feature surprisal than the combined rounded disharmonic condition (∆ η = 0.8555, p < 0.01); see Table 8 in the appendix.This is in line with earlier research (Baker, 2009) that found a bias towards ±BACK harmony over ±ROUND harmony.This is also the expected result when taking into account that many suffixes do not have +ROUND forms and therefore introduce noise to the data.

Neutral Vowels
Learning vowel dependencies across neutral vowels turned out to be difficult: For Manchu and Khalkha Mongolian the number of test items in this category was so low that no meaningful result could be produced.This is again caused by the nature of the data which consists of lemma forms.For Finnish and Hungarian the number of items was sufficient to conduct the appropriate significance tests, but the numbers are still small (102 and 63 respectively).The neural language model did not learn the association of neutral vowels with −BACK as assumed for Finnish and Hungarian, with significant ∆ η > 0 between the neutral harmonic and neutral disharmonic condition only for Khalkha Mongolian and ±ATR sequences.In Hungarian, neutral vowels are most likely to occur after other neutral vowels, but this is not the case for Finnish, Manchu and Khalkha Mongolian.On the other hand, Turkish as the only language in the sample without neutral vowels showed the largest difference between harmonic and disharmonic conditions for both ±BACK and ±ROUND (see App. C for results).
It may be noted that Turkish, the language with the strongest vowel harmony effect in terms of ∆ η , has no neutral vowels both for ±BACK and ±ROUND harmony.This could have facilitated the generalization on the ±BACK and ±ROUND harmony patterns for the neural language model, at least proving that the neural language model does indeed assign higher surprisal to disharmonic sequences, since there the harmony system is symmetrical and the number of vowels is the same for each feature.

Discussion and Conclusion
Prior work in the (computational) linguistics community has adopted information theory as a framework for the study of human language structure across different linguistic levels including phonology (e.g., Pimentel et al., 2020Pimentel et al., , 2021c)), morphology (e.g., Rathi et al., 2021;Wu et al., 2019), and syntax (e.g., Hahn et al., 2018;Futrell et al., 2015).Following the same spirit, we have introduced an information-theoretic metric to quantify vowel harmony based on feature surprisal.Our experiments have demonstrated that feature surprisal is a good indicator of whether a certain feature participates in vowel harmony patterns in a language, producing significant differences between harmonic and disharmonic conditions for most harmonic features in five vowel harmony languages.The effect was found on a very small sample of lemma forms with little to no morphological information, showing that large amounts of inflectional data are not necessary to identify some, but not all vowel harmony constraints.When calculated for ±BACK and ±ROUND features for five non-vowel harmony languages, the difference in surprisal was close to zero, meaning the neural language model did not detect any preference for harmony constraints in the languages evaluated.
We showed that neural language models can capture non-local harmony constraints over neutral vowels, which is not possible with count-based methods as employed by Mayer et al. (2010) or bigram models as in (Goldsmith and Riggle, 2012).
Here the resolution of the analysis is more finegrained with respect to the features underlying the harmonic groups.The advantage of the modeling approach presented here over both count-based and probabilistic models is that it can be used with a small dataset (word lists of about 1000 word-forms, of which ca. 300 are in the test set as the basis of the actual analysis).
The analysis presented could be extended to other types of phonological constraints, since neural language models in theory are able to learn all types of dependencies over sequences of arbitrary length.However, analysing Finnish, Hungarian, Manchu and Khalkha Mongolian required prior knowledge about harmonic vowels and the split of vowels into harmonic groups, either because the groups are not defined by the value of a feature as is the case for languages with neutral vowels, or because the feature representation in our standardized data itself might not describe a sound with the feature that is assumed to participate in vowel harmony.
If it is not known which vowels participate in vowel harmony, it seems best to use information on distinctive features in the data in order to find out which effects can be observed.However, if the vowel harmony patterns are as complex as in Khalkha Mongolian, the approach presented here would probably find its limits in corpus size.Identifying the approximate number of distinct wordforms needed to infer vowel harmony systems of individual language varieties (similar to previous studies inferring the number of words needed to get an approximate account of phoneme numbers, Dockum and Bowern 2019) would be an interesting topic for future analysis.represents sequences starting with a front/−BACK vowel, followed by a neutral/BACK neutral vowel and another front/BACK vowel.If more than one harmonic feature is present (as in Turkish, Manchu and Khalkha Mongolian), the magnitude of the effect on feature surprisal is compared between the two features in the disharmonic condition only (compare row "f_r/dish" in Table 8).

Figure 1 :
Figure 1: Surprisal reduction for the 10 varieties from NorthEuraLex.Best viewed in color.

Figure 3 :
Figure 3: Number of items with 2 vowels (x-axis) and 3 or more vowels (y-axis) in all languages in NorthEuraLex.Hungarian and Khalkha Mongolian in red circles.Languages were coded for language family (see legend) and identified by ISO codes.For a mapping of ISO codes to language see the NorthEuraLex website http: //www.northeuralex.org/languages.
used vowel succession counts derived from corpora of inflected word-forms to quantify vowel harmony in a large number of languages in terms of χ 2 -values, Nom.Sg. Gen. Sg.Nom.Pl. Table

Table 3 :
al. Inventory sizes and word list lengths in the data sampled from NorthEuraLex.

Table 5 :
Explanation of the abbreviations used in the result tables.The condition column refers to the type of harmony tested, with vowel successions abbreviated in the way described in this table.The sequence "f_n_f"

Table 6 :
P-values, ∆ η and effect size for Finnish feature surprisal

Table 7 :
P-values, ∆ η and effect size for Hungarian feature surprisal

Table 8 :
P-values, ∆ η and effect size for Turkish feature surprisal

Table 9 :
P-values, ∆ η and effect size for Manchu feature surprisal

Table 10 :
P-values, ∆ η and effect size for Khalkha Mongolian feature surprisal D Vowel Counts in Test Set