A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds

Andrea Horbach, Andrea Hensler, Sabine Krome, Jakob Prange, Werner Scholze-Stubenrecht, Diana Steffen, Stefan Thater, Christian Wellner, Manfred Pinkal


Abstract
We present an annotation study on a representative dataset of literal and idiomatic uses of German infinitive-verb compounds in newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus resource which offers itself as a testbed for automatic idiomaticity detection and coarse-grained word-sense disambiguation. We trained a classifier on the corpus which was able to distinguish literal and idiomatic uses with an accuracy of 85 %.
Anthology ID:
L16-1135
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
836–841
Language:
URL:
https://aclanthology.org/L16-1135
DOI:
Bibkey:
Cite (ACL):
Andrea Horbach, Andrea Hensler, Sabine Krome, Jakob Prange, Werner Scholze-Stubenrecht, Diana Steffen, Stefan Thater, Christian Wellner, and Manfred Pinkal. 2016. A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 836–841, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds (Horbach et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1135.pdf