Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art

Steffen Eger, Rüdiger Gleim, Alexander Mehler


Abstract
This paper relates to the challenge of morphological tagging and lemmatization in morphologically rich languages by example of German and Latin. We focus on the question what a practitioner can expect when using state-of-the-art solutions out of the box. Moreover, we contrast these with old(er) methods and implementations for POS tagging. We examine to what degree recent efforts in tagger development are reflected by improved accuracies ― and at what cost, in terms of training and processing time. We also conduct in-domain vs. out-domain evaluation. Out-domain evaluations are particularly insightful because the distribution of the data which is being tagged by a user will typically differ from the distribution on which the tagger has been trained. Furthermore, two lemmatization techniques are evaluated. Finally, we compare pipeline tagging vs. a tagging approach that acknowledges dependencies between inflectional categories.
Anthology ID:
L16-1239
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1507–1513
Language:
URL:
https://aclanthology.org/L16-1239
DOI:
Bibkey:
Cite (ACL):
Steffen Eger, Rüdiger Gleim, and Alexander Mehler. 2016. Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1507–1513, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art (Eger et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1239.pdf