Extraction of German Multiword Expressions from Parsed Corpora Using Context Features

Marion Weller, Ulrich Heid


Abstract
We report about tools for the extraction of German multiword expressions (MWEs) from text corpora; we extract word pairs, but also longer MWEs of different patterns, e.g. verb-noun structures with an additional prepositional phrase or adjective. Next to standard association-based extraction, we focus on morpho-syntactic, syntactic and lexical-choice features of the MWE candidates. A broad range of such properties (e.g. number and definiteness of nouns, adjacency of the MWE’s components and their position in the sentence, preferred lexical modifiers, etc.) along with relevant example sentences, are extracted from dependency-parsed text and stored in a data base. A sample precision evaluation and an analysis of extraction errors are provided along with the discussion of our extraction architecture. We furthermore measure the contribution of the features to the precision of the extraction: by using both morpho-syntactic and syntactic features, we achieve a higher precision in the identification of idiomatic MWEs, than by using only properties of one type.
Anthology ID:
L10-1293
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/428_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Marion Weller and Ulrich Heid. 2010. Extraction of German Multiword Expressions from Parsed Corpora Using Context Features. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Extraction of German Multiword Expressions from Parsed Corpora Using Context Features (Weller & Heid, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/428_Paper.pdf