Fast and Accurate Decision Trees for Natural Language Processing Tasks

Tiberiu Boros, Stefan Daniel Dumitrescu, Sonia Pipa


Abstract
Decision trees have been previously employed in many machine-learning tasks such as part-of-speech tagging, lemmatization, morphological-attribute resolution, letter-to-sound conversion and statistical-parametric speech synthesis. In this paper we introduce an optimized tree-computation algorithm, which is based on the original ID3 algorithm. We also introduce a tree-pruning method that uses a development set to delete nodes from over-fitted models. The later mentioned algorithm also uses a results caching method for speed-up. Our algorithm is almost 200 times faster than a naive implementation and yields accurate results on our test datasets.
Anthology ID:
R17-1016
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
103–110
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_016
DOI:
10.26615/978-954-452-049-6_016
Bibkey:
Cite (ACL):
Tiberiu Boros, Stefan Daniel Dumitrescu, and Sonia Pipa. 2017. Fast and Accurate Decision Trees for Natural Language Processing Tasks. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 103–110, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Fast and Accurate Decision Trees for Natural Language Processing Tasks (Boros et al., RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_016