Petra Steiner


2022

pdf bib
Converting a Database of Complex German Word Formation for Linked Data
Petra Steiner
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference

This work combines two lexical resources with morphological information on German word formation, CELEX for German and the latest release of GermaNet, for extracting and building complex word structures. This yields a database of over 100,000 German wordtrees. A definition for sequential morphological analyses leads to a Ontolex-Lemon type model. By using GermaNet sense information, the data can be linked to other semantic resources. An alignment to the CIDOC Conceptual Reference Model (CIDOC-CRM) is also provided. The scripts for the data generation are publicly available on GitHub.

2019

pdf bib
Augmenting a German Morphological Database by Data-Intense Methods
Petra Steiner
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99.

pdf bib
Combining Data-Intense and Compute-Intense Methods for Fine-Grained Morphological Analyses
Petra Steiner
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology

2018

pdf bib
Building a Morphological Treebank for German from a Linguistic Database
Petra Steiner | Josef Ruppenhofer
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Merging the Trees - Building a Morphological Treebank for German from Two Resources
Petra Steiner
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

pdf bib
Evaluating the morphological compositionality of polarity
Josef Ruppenhofer | Petra Steiner | Michael Wiegand
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Unknown words are a challenge for any NLP task, including sentiment analysis. Here, we evaluate the extent to which sentiment polarity of complex words can be predicted based on their morphological make-up. We do this on German as it has very productive processes of derivation and compounding and many German hapax words, which are likely to bear sentiment, are morphologically complex. We present results of supervised classification experiments on new datasets with morphological parses and polarity annotations.

2016

pdf bib
Refurbishing a Morphological Database for German
Petra Steiner
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as ß. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website.

2015

pdf bib
Ordering adverbs by their scaling effect on adjective intensity
Josef Ruppenhofer | Jasper Brandes | Petra Steiner | Michael Wiegand
Proceedings of the International Conference Recent Advances in Natural Language Processing