Learning from Within? Comparing PoS Tagging Approaches for Historical Text

Sarah Schulz, Jonas Kuhn


Abstract
In this paper, we investigate unsupervised and semi-supervised methods for part-of-speech (PoS) tagging in the context of historical German text. We locate our research in the context of Digital Humanities where the non-canonical nature of text causes issues facing an Natural Language Processing world in which tools are mainly trained on standard data. Data deviating from the norm requires tools adjusted to this data. We explore to which extend the availability of such training material and resources related to it influences the accuracy of PoS tagging. We investigate a variety of algorithms including neural nets, conditional random fields and self-learning techniques in order to find the best-fitted approach to tackle data sparsity. Although methods using resources from related languages outperform weakly supervised methods using just a few training examples, we can still reach a promising accuracy with methods abstaining additional resources.
Anthology ID:
L16-1684
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4316–4322
Language:
URL:
https://aclanthology.org/L16-1684
DOI:
Bibkey:
Cite (ACL):
Sarah Schulz and Jonas Kuhn. 2016. Learning from Within? Comparing PoS Tagging Approaches for Historical Text. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4316–4322, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Learning from Within? Comparing PoS Tagging Approaches for Historical Text (Schulz & Kuhn, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1684.pdf