Augmenting a German Morphological Database by Data-Intense Methods

Petra Steiner


Abstract
This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99.
Anthology ID:
W19-4221
Volume:
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Garrett Nicolai, Ryan Cotterell
Venue:
ACL
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
178–188
Language:
URL:
https://aclanthology.org/W19-4221
DOI:
10.18653/v1/W19-4221
Bibkey:
Cite (ACL):
Petra Steiner. 2019. Augmenting a German Morphological Database by Data-Intense Methods. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 178–188, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Augmenting a German Morphological Database by Data-Intense Methods (Steiner, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4221.pdf
Data
CELEX