Word Embeddings for Multi-label Document Classification

Ladislav Lenc, Pavel Král


Abstract
In this paper, we analyze and evaluate word embeddings for representation of longer texts in the multi-label classification scenario. The embeddings are used in three convolutional neural network topologies. The experiments are realized on the Czech ČTK and English Reuters-21578 standard corpora. We compare the results of word2vec static and trainable embeddings with randomly initialized word vectors. We conclude that initialization does not play an important role for classification. However, learning of word vectors is crucial to obtain good results.
Anthology ID:
R17-1057
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
431–437
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_057
DOI:
10.26615/978-954-452-049-6_057
Bibkey:
Cite (ACL):
Ladislav Lenc and Pavel Král. 2017. Word Embeddings for Multi-label Document Classification. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 431–437, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Word Embeddings for Multi-label Document Classification (Lenc & Král, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_057