A Spanish POS Tagger with Variable Memory

José Triviño, Rafael Morales-Bueno


Abstract
An implementation of a Spanish POS tagger is described in this paper. This implementation combines three basic approaches: a single word tagger based on decision trees, a POS tagger based on variable memory Markov models, and a feature structures set of tags. Using decision trees for single word tagging allows the tagger to work without a lexicon that lists only possible tags. Moreover, it decreases the error rate because there are no unknown words. The feature structure set of tags is advantageous when the available training corpus is small and the tag set large, which can be the case with morphologically rich languages like Spanish. Finally, variable memory Markov models training is more efficient than traditional full-order Markov models and achieves better accuracy. In this implementation, 98.58% of tokens are correctly classified.
Anthology ID:
2000.iwpt-1.25
Volume:
Proceedings of the Sixth International Workshop on Parsing Technologies
Month:
February 23-25
Year:
2000
Address:
Trento, Italy
Editors:
Alberto Lavelli, John Carroll, Robert C. Berwick, Harry C. Bunt, Bob Carpenter, John Carroll, Ken Church, Mark Johnson, Aravind Joshi, Ronald Kaplan, Martin Kay, Bernard Lang, Alon Lavie, Anton Nijholt, Christer Samuelsson, Mark Steedman, Oliviero Stock, Hozumi Tanaka, Masaru Tomita, Hans Uszkoreit, K. Vijay-Shanker, David Weir, Mats Wiren
Venue:
IWPT
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
254–265
Language:
URL:
https://aclanthology.org/2000.iwpt-1.25
DOI:
Bibkey:
Cite (ACL):
José Triviño and Rafael Morales-Bueno. 2000. A Spanish POS Tagger with Variable Memory. In Proceedings of the Sixth International Workshop on Parsing Technologies, pages 254–265, Trento, Italy. Association for Computational Linguistics.
Cite (Informal):
A Spanish POS Tagger with Variable Memory (Triviño & Morales-Bueno, IWPT 2000)
Copy Citation:
PDF:
https://aclanthology.org/2000.iwpt-1.25.pdf