Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy

Mohamed Outahajala, Paolo Rosso


Abstract
Like most of the languages which have only recently started being investigated for the Natural Language Processing (NLP) tasks, Amazigh lacks annotated corpora and tools and still suffers from the scarcity of linguistic tools and resources. The main aim of this paper is to present a new part-of-speech (POS) tagger based on a new Amazigh tag set (AMTS) composed of 28 tags. In line with our goal we have trained Conditional Random Fields (CRFs) to build a POS tagger for the Amazigh language. We have used the 10-fold technique to evaluate and validate our approach. The CRFs 10 folds average level is 87.95% and the best fold level result is 91.18%. In order to improve this result, we have gathered a set of about 8k words with their POS tags. The collected lexicon was used with CRFs confidence measure in order to have a more accurate POS-tagger. Hence, we have obtained a better performance of 93.82%.
Anthology ID:
L16-1683
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4311–4315
Language:
URL:
https://aclanthology.org/L16-1683
DOI:
Bibkey:
Cite (ACL):
Mohamed Outahajala and Paolo Rosso. 2016. Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4311–4315, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Using a Small Lexicon with CRFs Confidence Measure to Improve POS Tagging Accuracy (Outahajala & Rosso, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1683.pdf