Yves Peirsman


2013

2011

2010

2009

2008

Semantic similarity is a key issue in many computational tasks. This paper goes into the development and evaluation of two common ways of automatically calculating the semantic similarity between two words. On the one hand, such methods may depend on a manually constructed thesaurus like (Euro)WordNet. Their performance is often evaluated on the basis of a very restricted set of human similarity ratings. On the other hand, corpus-based methods rely on the distribution of two words in a corpus to determine their similarity. Their performance is generally quantified through a comparison with the judgements of the first type of approach. This paper introduces a new Gold Standard of more than 5,000 human intra-category similarity judgements. We show that corpus-based methods often outperform (Euro)WordNet on this data set, and that the use of the latter as a Gold Standard for the former, is thus often far from ideal.
Vector-based models of lexical semantics retrieve semantically related words automatically from large corpora by exploiting the property that words with a similar meaning tend to occur in similar contexts. Despite their increasing popularity, it is unclear which kind of semantic similarity they actually capture and for which kind of words. In this paper, we use three vector-based models to retrieve semantically related words for a set of Dutch nouns and we analyse whether three linguistic properties of the nouns influence the results. In particular, we compare results from a dependency-based model with those from a 1st and 2nd order bag-of-words model and we examine the effect of the nouns’ frequency, semantic speficity and semantic class. We find that all three models find more synonyms for high-frequency nouns and those belonging to abstract semantic classses. Semantic specificty does not have a clear influence.

2006

To this day, the automatic recognition of metonymies has generally been addressed with supervised approaches. However, these require the annotation of a large number of training instances and hence, hinder the development of a wide-scale metonymy recognition system. This paper investigates if this knowledge acquisition bottleneck in metonymy recognition can be resolved by the application of unsupervised learning. Although the investigated technique, Schütze’s (1998) algorithm, enjoys considerable popularity in Word Sense Disambiguation, I will show that it is not yet robust enough to tackle the specific case of metonymy recognition. In particular, I will study the influence on its performance of four variables—the type of data set, the size of the context window, the application of SVD and the type of feature selection.