Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences

The increase in performance in NLP due to the prevalence of distributional models and deep learning has brought with it a reciprocal decrease in interpretability. This has spurred a focus on what neural networks learn about natural language with less of a focus on how. Some work has focused on the data used to develop data-driven models, but typically this line of work aims to highlight issues with the data, e.g. highlighting and offsetting harmful biases. This work contributes to the relatively untrodden path of what is required in data for models to capture meaningful representations of natural language. This is entails evaluating how well English and Spanish semantic spaces capture a particular type of relational knowledge, namely the traits associated with concepts (e.g. bananas-yellow), and exploring the role of co-occurrences in this context.


Introduction
Vector space models have been the main driving force behind progress in NLP.Most work in this area, either in the form of static or contextualised embeddings, has been based on co-occurrence statistics and largely driven by the distributional hypothesis (Harris, 1954;Firth, 1957).This has also resulted in these representations seemingly capturing certain relational knowledge, such as word analogies (Mikolov et al., 2013b;Gittens et al., 2017).In this context, Chiang et al. (2020) found that the ability of word embeddings to evaluate analogies was not greatly impaired by removing co-occurrences related to relational pairs.This suggests there are limits to how the distributional hypothesis impacts the encoding of relational knowledge.We extend this line of work by focusing on the relational knowledge of concepts and traits.We also creep beyond English by translating concept and traits used in one of our datasets into Spanish.Contributions: (1) We show that there is no impact on the ability of semantic spaces to predict whether a pair of embeddings corresponds to a trait-concept pair or to predict what traits a given concept has when removing co-occurrences of concepts and traits.(2) We developed a freely available dataset that can be used for further trait-based relational knowledge analyses for English and Spanish.1

Related work
What models learn Evaluation of neural semantic spaces has focused on what knowledge they capture with a slew of work showing that some knowledge of analogies can be seen by applying simple transformations (Mikolov et al., 2013b;Levy and Goldberg, 2014;Arora et al., 2016;Paperno and Baroni, 2016;Gittens et al., 2017;Ethayarajh et al., 2019).Others have investigated what syntactic information neural semantic spaces seem to capture with most showing that they do capture something deeper than surface patters (Linzen et al., 2016;Gulordava et al., 2018;Giulianelli et al., 2018).However, they fail to exhaustively capture syntactic phenomena and specifically have been shown to struggle with polarity (Futrell et al., 2018;Jumelet and Hupkes, 2018) and certain filler-gap dependencies (Wilcox et al., 2018;Chowdhury and Zamparelli, 2018).Pretrained language models (PLMs) have been found to capture varying degrees of syntactic information (Peters et al., 2018;Tenney et al., 2019;Goldberg, 2019;Clark et al., 2019), however, they have also been shown to struggle to predict the grammaticality of sentences (Marvin and Linzen, 2018;Warstadt et al., 2019) and seem to depend on fragile heuristics rather than anything deeper (McCoy et al., 2019).Relational knowledge More specifically with respect to relational knowledge and semantic spaces, for some time now work has shown that semantic spaces could encode certain relational knowledge, e.g.knowledge of the relative positioning of geographical locations (Louwerse and Zwaan, 2009).Similarly, Gupta et al. (2015) found that embeddings capture something of relational knowledge associated with countries and cities, e.g.how countries related to one another with respect to GDP.Rubinstein et al. (2015) found that word embeddings captured some taxonomic relational knowledge but fared less well with respect to trait-based relational knowledge.Often analogy completion tasks are used to investigate what sort of relational knowledge a semantic space has captured with early work showing that simple linear transformations were enough to highlight analogies (Mikolov et al., 2013a;Vylomova et al., 2016).This method has drawn some criticism and has been challenged as a robust means of evaluating what relational knowledge models capture (Drozd et al., 2016;Gladkova et al., 2016;Schluter, 2018;Bouraoui et al., 2018).Attempts to evaluate what PLMs capture of relational knowledge have also been made, highlighting that these larger, more data-hungry models capture some but not all relational knowledge (Forbes et al., 2019;Bouraoui et al., 2020).
Patterns in data However, all the work cited above focuses work focuses on what models learn about relational knowledge and not how, or rather what are the salient signals in the data used in these techniques that manifest in relational knowledge.Some work has been done in this direction, with Pardos and Nam (2020) showing co-occurrences are not necessary in their distributional model of courses to predict similar or related courses.Chiang et al. (2020) evaluated this finding in neural semantic spaces, finding that the ability of a semantic space to complete analogies isn't impacted when removing co-occurrences It is important to understand what aspects of the data result in what models learn because without this semblance of interpretability, problematic biases can creep in, e.g.gender biases in Word2Vec (Bolukbasi et al., 2016) or in BERT (Bhardwaj et al., 2021).Attempts have been made to mitigate certain biases in contexualised word embeddings (Kaneko and Bollegala, 2021), but in order to do so, the biases have to be known.Also, Shwartz and Choi (2020) discuss the issue of reporting bias in the data typically used in NLP, where rarer occurrences are more likely to be explicitly mentioned than common ones which results in models that can generalise about under-reported phenomena but not temper the over-reported information.Therefore it is necessary to understand the nature of the data and how it impacts what models capture and how.
In this work, we aim to expand on the work of Chiang et al. (2020) in two main ways.First, we do not use analogies and analogy completion to evaluate the impact co-occurrences of concepttraits has on relational knowledge developed in neural semantic spaces, but instead use a dataset of different trait-based relations (e.g.is-colour, has-component) derived from the MCRAE and NORMS feature datasets.This allows us to more directly evaluate the ability of models to predict relational knowledge by casting the evaluation as a simple classification task (both in a multi class and binary class setting).And second, we extend the analysis by looking at Spanish data as well to evaluate whether the results extend beyond English.

Methodology
The methodology follows five sequential steps: the development of datasets that include concepts and their traits (Section 3.1); the selection and processing of large general-domain corpora (Section 3.2); the transformation of the selected corpora based on the concept-trait datasets to test our hypothesis (Section 3.3); training of word embeddings on the original and adapted corpora (Section 3.4); and finally the evaluation of the embeddings based on the trait-based datasets (Section 3.5).

Datasets
The datasets were based on the MCRAE features dataset (McRae et al., 2005).This is a collection of semantics features associated with a large set of concepts (541) generated from features given by human participants.A secondary trait-based dataset was also collated for English based on the NORMS dataset (Devereux et al., 2014).This is developed in the same way as MCRAE and is partially an extension of that dataset with 638 concepts.We wanted to avoid value judgements (such as is-feminine) and to collate more trait-based relations, that is pairs of words related by an inherent attribute of a concept.MCRAE-EN The first step in developing the datasets used in this work was to collate certain features into subsets of similar traits.This was done in a partially manual way by splitting data into 5 subsets.Each feature in MCRAE has the number of participants who specified that feature for that concept, so initially a frequency cut of 10 was applied to the features.From this set, we observed a number of similar traits that broadly fit into trait categories.A series of simple heuristics were then applied to extract all potential concept-feature pairs for each subset.For some trait types this was trivial with the MCRAE dataset, e.g.colour relations could be found using the feature classification in MCRAE of visual-colour.The full details of the heuristics can be seen in Appendix A. This process resulted in 5 trait-based subsets: colours, components, materials, size & shape, and tactile.From each subset, we removed duplicates (e.g.ambulance has the features is-white, is-red, and is-orange in the colour subset).2And from the remaining concept-feature pairs, we cut on 10+ concepts per trait to ensure a suitable number of instances per target in our evaluation.The resulting statistics associated with this dataset can be seen in the top section of Table 1.

MCRAE-ES
The set of concepts and trait words that occur across all 5 subsets were manually translated.The translators consisted of one native English speaker with some knowledge of Spanish and one native Spanish speaker who is fluent in English.
As might be expected, issues occurred when undertaking the translation that required judgements to be made.When there was a one to many trans-lation, we used the translation that was Iberian if multiple translations were due to regional variants.Otherwise we chose the most common or most canonical.However, we also chose single word alternatives to avoid multiword concepts when this wouldn't have resulted in using an obscure word.We also made some choices to avoid having duplicate/competing concepts, i.e. boat was translated as barca and ship as barco.Further, we tried to match the intended use in English, i.e. translated sledgehammer to almádena rather than more generic term in Spanish mazo as heavy metal version is more standard in English.Otherwise we tried to use more generic options.A variety of resources were used to aid this including bilingual dictionaries, Wikipedia, and RAE (Real Academia Española).Despite our best efforts to maintain as many concept-trait pairs as possible, certain concepts just don't work in Spanish, typically many to one translations, e.g.dove translates to paloma which also means normal mangy pigeons.A more common issue was the tendency to use multi-word expressions in Spanish for certain concepts, such as goldfish (pez dorado) and escalator (escalera mecánica) with no single-word alternatives.The statistics resulting to the trait subsets for MCRAE-ES are shown in the bottom section of Table 1.

NORMS-EN
To make our experiments more robust, we also used the NORMS dataset.In order to use this dataset, we manually classified features in this dataset based on the subset from our MCRAE trait dataset.First, we cut the features in NORMS that occurred less than 10 times and then took the Corpus Sentences Tokens set of remaining features and classified them as one of the five subsets and then automatically cast each concept-trait pair into their respective subset.We manually checked to see if any features not used had been erroneously omitted due to annotation issues and folded those features into the relative subsets.This entailed adding is-liquid and is-furry to the tactile subset after some consideration (with is-furry subsequently being removed due to the minimum frequency cut after removing duplicates).The resulting subsets had duplicate concepts removed and then a minimum frequency cut on the remaining features of 10.The statistics of the resulting subsets can be seen in the middle section of Table 1 with the number of new unique concepts added to each subset shown in parenthesis in the concept count (N C ) column.

Corpora
For the statistics of the corpora used see Table 2. UMBC The University of Maryland, Baltimore County (UMBC) webbase corpus is the resulting collection of paragraphs from a webcrawl in 2007 over millions of webpages (Han et al., 2013).
ES1B The Spanish Billion Words Corpus (ES1B) is a collection of unannotated sentences takens from the web which span difference sources from Europarl to books.It also include data from a Wikipedia dump from 2015, so has some crossover with the Spanish Wikipedia corpus (Cardellino, 2019).
Wiki We used English Wikipedia dump from 1st October 2021 and Spanish Wikipedia dump from 1st January 2022.They were extracted and cleaned using the WikiExtractor tool from Attardi (2015).This left document ID HTML tags in the data which we removed with a simple heuristic.
Wee-Wiki Similar to the standard pre-processing of the Wikipedia data, but we also cut articles with very little views as these tend to be stub articles and automatically generated articles.The idea be-  In the example here they are linked by an nmod edge (highlighted in blue).For the syntactic removal method this sentence would be removed.
hind this is to cultivate a cleaner and more natural version of the data.We used Wikipedia's official viewing statistics for 1st December 2021. 3Articles with less than 10 views were removed.

Removing co-occurrences
We used 3 methods to remove co-occurrences with different levels of granularity to find cooccurrences.The first step in the process was to segment the corpora by sentence and to lemmatise the tokens.This was done using the spaCy library and the corresponding pre-trained models for English and Spanish (Montani et al., 2022).We used lemmas to handle gender of adjectives and nouns in Spanish and for plural forms in both languages.The segmented version of each corpus was then split into two separate corpora with 80% of the sentences in the first, which were used as the standard corpora in our experiments, and with 20%, which were used as reserves for replacing sentence with co-occurrences when creating input data without co-occurrences.When an instance was removed based on the criteria specified below, a random sentence was selected from the reserves, so as to balance the total number of sentences in each set. 4 The resulting number of instances removed is shown in Table 3 (English) and in Table 4 (Spanish).Sentence The simplest method used was to merely remove any sentence where a concept and its corresponding trait was observed.The lemmatised version of the data was used to search for cooccurrences to be more thorough, especially with  respect to the Spanish data.This entails using the lemmatised version of the concepts and traits to match them in the lemmatised instances in the data.This was done independently for each trait type.
Window The second method used removed instances when the concept and its relative trait occurred within a given window, again using lemmatised forms.The window size used was 10 to match the size used during the training of the embeddings.Syntactic Finally, used the Stanza library and the corresponding pre-trained models available for English and Spanish to parse the instances where a concept and its relative trait occurred (Qi et al., 2020).If an edge between the concept and the trait was predicted after finding a co-occurrence using the lemmas, this was removed, otherwise the instance was left.This method tests whether co-occurrences which are syntactically related are more impactful than haphazard co-occurrences.An example is shown in Figure 1.

Word embeddings
The models used to evaluate the impact of cooccurrences were trained using the Gensim library ( Řehůřek and Sojka, 2010).We used CBOW Word2Vec embedding models (Mikolov et al., 2013a) as they are quicker to train than skip-gram models which was paramount considering the number of models that were required.Further, Chiang et al. (2020) found no significant differences between CBOW and Skip-gram models with respect to the differences observed in analogy completion between models trained with and without co-occurrences.We used the default hyperparameters in Gensim except for embedding size which was set to 300 and window size which was set to 10, i.e. the same settings from Chiang et al. (2020).
For each trait-type and for each corpus a model was trained on the data containing co-occurrences (with or w/ in tables) and the data not containing co-occurrences (without or w/o in tables).We trained multiple models for the data including cooccurrences -once per trait type -giving us a robust measurement of those models' performance.This means that results for each with for each trait type across the extraction methods are trained on the same data and are reported to show the variation seen training models on the same data.5

Classifiers
Trait-based relational knowledge was evaluated by casting it as a classification problem.
Multi-class First we used a multi-class evaluation.
Using the datasets described in Section 3.1, given a concept (e.g.banana), the task consisted of selecting the most appropriate trait for a given trait type (e.g.yellow in the colour dataset).We used a support vector machine (SVM) as our classifier from the Scikit-learn library (Pedregosa et al., 2011) with the word embeddings learned in the previous step as the only input.For each model we used 3-fold cross-validation and report the mean score across the splits. 6For each pair of models (i.e. with and without co-occurrences for a given trait-type and for a given corpus), we checked to see if concepts appeared in both semantic spaces.When a concept was missing in one or both, it was removed from the dataset for both, such that the comparison of results is robust between the two models we are interested in comparing, however, this was not common.It brought up an issue with orange and naranja, namely that it occurs as a concept and as trait, so that in our extraction method for sentence and window occurrences of these are always removed from the corpora and so were removed from the evaluation datasets.
Binary We also use binary classification by exploiting the earlier findings suggesting that differences between embeddings can be used as a proxy to capture semantic relations (Mikolov et al., 2013b;Vylomova et al., 2016).Again, we used SVM models, but this time the input features were the differences between concepts and their respective traits (i.e. e c − e t , where e c is the concept embedding and e t is the trait embedding) and the model predicted whether a pair was related or not.This required developing negative samples.This was done by randomly selecting words from the vocab space of the union of vocabs between each pair of model (i.e. with and without co-occurrences for a given trait type and a given corpus).These words then underwent a modicum of a control check by using lexical databases: WordNet (Fellbaum, 2000) for English and the Multilingual Central Repository version 3.0 for Spanish (Gonzalez-Agirre et al., 2012) via the Natural Language Toolkit (Bird et al., 2009).Once a word was randomly selected from the vocab space (excluding the concepts in the given dataset), the respective lexical database was checked to see if it contained the word and if so whether the synonyms associated with it were at least sometimes nouns (that is the synonym set of nouns contained at least one item).This was so that the selected word could in theory be something akin to a concept and not just gobbledygook.This procedure was done so the number of concepts in the negative sample set matched the number in the positive sample set (which had instances removed that didn't appear in one or both of the paired models similar to the multi-class setup).Then each randomly extracted negative concept was ascribed a trait from the given trait space.Similar to the multiclass SVM setup, 3-fold cross-validation was used and the mean score across the splits is reported.7 4 Results

Multi-class results
The results for the multi-class experiments can be seen in Table 5 for the English corpora and in Table 6 for the Spanish corpora.The highest performing model for each pair of models, i.e. with (w/) and without (w/o) co-occurrences is highlighted in bold for clarity.Across the board, it is clear that there is no consistent pattern as to whether a model trained with co-occurrences outperforms a model trained without them or vice versa.This holds for all three co-occurrence extraction techniques, for all trait types, for all datasets, and for all corpora across both languages.This is similar to the findings of Chiang et al. (2020) where little effect was observed on analogy completion whether co-occurrences were included or not, however, a systemic decrease was observed in that context despite it being small.While there are some differences between some models, the differences that would be required to make claims of one model being superior to another are much larger than what are observed here as the experimental setup isn't robust enough to verify that a difference of 0.01-0.02 is significant or not.A visualisation of the differences between each corresponding with and without model for MCRAE-EN by trait type can be seen in Figure 2

Binary results
The results from the binary classification experiments substantiate these findings.
They can be seen in Table 7 for English and in Table 8 for Spanish.Again, no pattern emerges across the different experimental dimensions that would suggest the removal of co-occurrences has impacted a model's ability to predict whether a pair is related or not.The overall high performance on the binary classification experiment for both English and Spanish suggests these models manage to encode meaningful information about these trait relations.But how this emerges is not clear.The simplest explanation is that suitably accurate representations are learnt due to the amount of data, but it could be for any number of other reasons not investigated here.

Discussion
The results highlight some tentatively interesting patterns with respect to trait types.In both English and Spanish, models perform consistently well on component traits, although for NORMS this turned out to be only over 2 traits, effectively casting it as a binary classification.Materials is the next consistently highest performing trait type across corpora and language with size & shape and tactile not far behind for English, but with a bigger gap in Spanish.The performance on colour traits is low across all settings and languages.This doesn't appear to be based on the size of the trait subset, e.g. the component subset is one of the smaller sets, yet has high performance and the performance of the other trait types don't vary with respect to the number of instance and unique features.
The number of removed sentences, as shown in Tables 3 and 4, gives a vague indication of the occurrences of the concepts in the dataset and the occurrence of their traits with colour sentence removals being the second highest for MCRAE-EN across all three English corpora, the third highest for NORMS-EN, and the highest for MCRAE-ES across all Spanish corpora.These rankings are consistent across extraction methods.Therefore, it is unlikely that the embeddings for the colours and the corresponding concepts (often concepts that occur in the other datasets) are somehow low quality due to low occurrences of these words.More likely is that the colour relation is more difficult than the other trait types as the other types are more tangible and more specific.Although this doesn't necessarily hold for size & shape traits, specifically sizes which tend to be relative, e.g. in MCRAE a plane can be large (which it is, relative to most things) but so too can a bathtub (which it is, relative to a mouse or other such timorous beasties, but not relative to a house).However, size & shape is consistently one of the traits that models perform worst on especially with NORMS-EN and MCRAE-ES.
As a final note, the different extraction methods yield no differences when compared to one another.This can be observed clearly in Figure 3 in the main text and Figure 6 in Appendix B. While the number of extracted instances using the syntactically related co-occurrences is very low and so difficult to draw any major conclusions, the number of sentence-based and window-based instances removed are quite high and are similar in magnitude.From this, we can deduce that the proximity of the words also doesn't have a major impact on the ability of a semantic space to encode relational knowledge.It could still be the case that if the data used to train models contained more syntactically related concept-trait pairs, they would encode more relational knowledge, but it is clear that their absence doesn't result in the models losing what relational knowledge they can capture.Many questions remain on how these distributional models encode relational knowledge.We have merely presented results which do not support the hypothesis that direct co-occurrence are the major signal for this process as related to trait-based relational knowledge.
Language models and wider impact of findings.Whether the results observed here for static embeddings would hold for PLMS isn't a given.While they are still based on the same distributional hypothesis and adopt statistical methods to encode salient features of language, they could potentially be more sensitive to the loss of co-occurrences in the training data.But this is an open research question that requires specific experimentation which has its own difficulties, i.e. prompting language models often includes lexical clues which cloud our ability to say with any great certainty if they have captured some phenomenon or not, see Kassner and Schütze (2020) for sensitivity of PLMs to mispriming).
The results do suggest that merely increasing the amount of data used likely won't result in any major improvements in the ability of models to encode relational knowledge or commonsense knowledge more generally, which is attested to by recent work in Li et al. (2021).Potentially, we need to look to more complex methods to augment NLP systems with commonsense knowledge potentially using multimodal systems, e.g.language models trained with visual cues as was done in Paik et al. (2021) to offset reporting bias with respect to colours.Alternatively, we can focus on the linguistic input and consider how to add stronger signals in the data used to train NLP systems.

Conclusion
We have contributed to the emerging interest in how neural semantic models encode linguistic information, focusing on trait-based relational knowledge.We have extended findings which showed that co-occurrences of relational pairs didn't have a major impact on a model's ability to encode knowl-edge of analogies by complementing this analysis with an evaluation of trait-based relational knowledge.We extended the analysis to include different extraction methods to evaluate whether a more fine-grained approach would highlight any differences in performance and found that this is not the case.The work presented here also expands beyond English and includes results in Spanish which follow the same trend.Finally, we have cultivated a set of datasets for different trait types in both English and Spanish (based on MCRAE and NORMS) which are available at https://github.com/cardiffnlp/trait-concept-datasets.

A MCRAE-EN trait subset extraction heuristics
Here we describe the full heuristics used to develop the trait-based subsets from MCRAE used in our experiments Some traits were trivial to extract.Colour relations were the simplest as they could be found using the feature classification in MCRAE of visualcolour.Component relations were shortlisted cutting on the WB feature classification (this is simply a classification of trait types where W and B refer to the practitioners who classified the concept-features pairs in unpublished work) in MCRAE using external_component and internal_component and then by extracting features beginning with has_.Similarly for material relations, the WB classification of made_of was used.Some manual corrections were applied to the components to extend the number of instances in the dataset and to make certain traits fit our experimental setup better.This involved casting features such as has-4-legs and has-4-wheels as simply has-legs and has-wheels, respectively.The feature made-of-material was cut from the material subset, the feature has-an-inside from the components subset, and the features is-colourful and different-colours were removed from the colour subset.
We then looked at the WB label external_surface_property (excluding features that fit into the colour, concept, or material subset) as this fit our desired trait-based feature space.The majority of concepts in this subset tended to have features relating to their shape or to their size, so we opted to use this pair (size & shape) as another subset.This required manually removing features that didn't fit this trait-type, e.g.is-smelly, is-shiny, and so on.In this process, a final possible subset of tactile-based traits became apparent which was cut using the BR feature classification (this is simply a classification of trait types from different practitioners than WB) tactile and then manually removing certain features which were more value judgements than traits, such as is-comfortable or is-warm.

B Visualisations of NORMS-EN and MCRAE-ES results
Cerró la puerta del granero English: She/he closed the barn door

Figure 1 :
Figure1: granero (highlighted in red) is a concept in MCRAE-ES with a component trait of puerta (highlighted in blue).In the example here they are linked by an nmod edge (highlighted in blue).For the syntactic removal method this sentence would be removed.

Figure 3 :
Figure 3: Distributions of delta accuracy (∆Acc) for pairs for each extraction method in MCRAE-EN.

Figure 4 :
Figure 4: Distributions of delta accuracy (∆Acc) for corresponding pairs for each trait type in NORMS-EN.

Figure 5 :
Figure 5: Distributions of delta accuracy (∆Acc) for corresponding pairs for each trait type in MCRAE-ES.

Figure 6 :
Figure 6: Distributions of delta accuracy (∆Acc) for corresponding pairs for extraction method in NORMS-EN.

Figure 7 :
Figure 7: Distributions of delta accuracy (∆Acc) for corresponding pairs for each extraction method in MCRAE-ES.

Table 1 :
Dataset statistics: N C is the number of concepts, N T is the number of unique features, NORMS N C includes unique count in parenthesis, and the number in parenthesis for traits is the number of concepts with that trait.

Table 2 :
Basic statistics of corpora used.

Table 3 :
Total instances removed and replaced for English Corpora (UMBC, Wiki, Wee-Wiki) for each dataset (MCRAE and NORMS) by trait type and removal method (sentence, window, and syntactic as described in §3.3).

Table 4 :
Total instances removed and replaced for each Spanish Corpora (ES1B, Wiki, Wee-Wiki) for the MCRAE dataset broken down by trait type and removal method (sentence, window, and syntactic as described in §3.3).

Table 5 :
Multi-class SVM results for English corpora and datasets by trait type and extraction method for models trained on data with (w/) and without (w/o) co-occurrences.Average accuracy across 3-fold cross validation is reported with best performing model between paired w/ and w/o models highlighted in bold.

Table 6 :
Multi-class SVM results for Spanish corpora and datasets by trait type and extraction method for models trained on data with (w/) and without (w/o) co-occurrences.Average accuracy across 3-fold cross validation is reported with best performing model between paired w/ and w/o models highlighted in bold.

Table 7 :
Binary SVM results for English corpora and datasets by trait type and extraction method for models trained on data with (w/) and without (w/o) co-occurrences.Average accuracy across 3-fold cross validation is reported with best performing model between paired w/ and w/o models highlighted in bold.

Table 8 :
Binary SVM results for Spanish corpora and datasets by trait type and extraction method for models trained on data with (w/) and without (w/o) co-occurrences.Average accuracy across 3-fold cross validation is reported with best performing model between paired w/ and w/o models highlighted in bold.