EstNLTK - NLP Toolkit for Estonian

Siim Orasmaa; Timo Petmanson; Alexander Tkachenko; Sven Laur; Heiki-Jaan Kaalep

EstNLTK - NLP Toolkit for Estonian

Siim Orasmaa, Timo Petmanson, Alexander Tkachenko, Sven Laur, Heiki-Jaan Kaalep

Abstract

Although there are many tools for natural language processing tasks in Estonian, these tools are very loosely interoperable, and it is not easy to build practical applications on top of them. In this paper, we introduce a new Python library for natural language processing in Estonian, which provides unified programming interface for various NLP components. The EstNLTK toolkit provides utilities for basic NLP tasks including tokenization, morphological analysis, lemmatisation and named entity recognition as well as offers more advanced features such as a clause segmentation, temporal expression extraction and normalization, verb chain detection, Estonian Wordnet integration and rule-based information extraction. Accompanied by a detailed API documentation and comprehensive tutorials, EstNLTK is suitable for a wide range of audience. We believe EstNLTK is mature enough to be used for developing NLP-backed systems both in industry and research. EstNLTK is freely available under the GNU GPL version 2+ license, which is standard for academic software.

Anthology ID:: L16-1390
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 2460–2466
Language:
URL:: https://aclanthology.org/L16-1390/
DOI:
Bibkey:
Cite (ACL):: Siim Orasmaa, Timo Petmanson, Alexander Tkachenko, Sven Laur, and Heiki-Jaan Kaalep. 2016. EstNLTK - NLP Toolkit for Estonian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2460–2466, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: EstNLTK - NLP Toolkit for Estonian (Orasmaa et al., LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1390.pdf

PDF Cite Search Fix data