Do LSTMs really work so well for PoS tagging? – A replication study

Tobias Horsmann, Torsten Zesch


Abstract
A recent study by Plank et al. (2016) found that LSTM-based PoS taggers considerably improve over the current state-of-the-art when evaluated on the corpora of the Universal Dependencies project that use a coarse-grained tagset. We replicate this study using a fresh collection of 27 corpora of 21 languages that are annotated with fine-grained tagsets of varying size. Our replication confirms the result in general, and we additionally find that the advantage of LSTMs is even bigger for larger tagsets. However, we also find that for the very large tagsets of morphologically rich languages, hand-crafted morphological lexicons are still necessary to reach state-of-the-art performance.
Anthology ID:
D17-1076
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
727–736
Language:
URL:
https://aclanthology.org/D17-1076
DOI:
10.18653/v1/D17-1076
Bibkey:
Cite (ACL):
Tobias Horsmann and Torsten Zesch. 2017. Do LSTMs really work so well for PoS tagging? – A replication study. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 727–736, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Do LSTMs really work so well for PoS tagging? – A replication study (Horsmann & Zesch, EMNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/D17-1076.pdf