Isabel Fuhrmann


2020

pdf bib
All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German
Yana Strakatova | Neele Falk | Isabel Fuhrmann | Erhard Hinrichs | Daniela Rossmann
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we present the GerCo dataset of adjective-noun collocations for German, such as alter Freund ‘old friend’ and tiefe Liebe ‘deep love’. The annotation has been performed by experts based on the annotation scheme introduced in this paper. The resulting dataset contains 4,732 positive and negative instances of collocations and covers all the 16 semantic classes of adjectives as defined in the German wordnet GermaNet. The dataset can serve as a reliable empirical basis for comparing different theoretical frameworks concerned with collocations or as material for data-driven approaches to the studies of collocations including different machine learning experiments. This paper addresses the latter issue by using the GerCo dataset for evaluating different models on the task of automatic collocation identification. We compare lexical association measures with static and contextualized word embeddings. The experiments show that word embeddings outperform methods based on statistical association measures by a wide margin.