The hunvec framework for NN-CRF-based sequential tagging

Katalin Pajkossy; Attila Zséder

The hunvec framework for NN-CRF-based sequential tagging

Abstract

In this work we present the open source hunvec framework for sequential tagging, built upon Theano and Pylearn2. The underlying statistical model, which connects linear CRF-s with neural networks, was used by Collobert and co-workers, and several other researchers. For demonstrating the flexibility of our tool, we describe a set of experiments on part-of-speech and named-entity-recognition tasks, using English and Hungarian datasets, where we modify both model and training parameters, and illustrate the usage of custom features. Model parameters we experiment with affect the vectorial word representations used by the model; we apply different word vector initializations, defined by Word2vec and GloVe embeddings and enrich the representation of words by vectors assigned trigram features. We extend training methods by using their regularized (l2 and dropout) version. When testing our framework on a Hungarian named entity corpus, we find that its performance reaches the best published results on this dataset, with no need for language-specific feature engineering. Our code is available at http://github.com/zseder/hunvec

Anthology ID:: L16-1678
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 4278–4281
Language:
URL:: https://aclanthology.org/L16-1678/
DOI:
Bibkey:
Cite (ACL):: Katalin Pajkossy and Attila Zséder. 2016. The hunvec framework for NN-CRF-based sequential tagging. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4278–4281, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: The hunvec framework for NN-CRF-based sequential tagging (Pajkossy & Zséder, LREC 2016)
Copy Citation:
PDF:: https://aclanthology.org/L16-1678.pdf
Code: zseder/hunvec

PDF Cite Search Code Fix data