Boosting statistical tagger accuracy with simple rule-based grammars

Mans Hulden, Jerid Francom


Abstract
We report on several experiments on combining a rule-based tagger and a trigram tagger for Spanish. The results show that one can boost the accuracy of the best performing n-gram taggers by quickly developing a rough rule-based grammar to complement the statistically induced one and then combining the output of the two. The specific method of combination is crucial for achieving good results. The method provides particularly large gains in accuracy when only a small amount of tagged data is available for training a HMM, as may be the case for lesser-resourced and minority languages.
Anthology ID:
L12-1640
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2114–2117
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1075_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Mans Hulden and Jerid Francom. 2012. Boosting statistical tagger accuracy with simple rule-based grammars. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2114–2117, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Boosting statistical tagger accuracy with simple rule-based grammars (Hulden & Francom, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1075_Paper.pdf