MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Alexandre Bérard, Christophe Servan, Olivier Pietquin, Laurent Besacier


Abstract
We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes word2vec’s features, paragraph vector (batch and online) and bivec for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.
Anthology ID:
L16-1662
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4188–4192
Language:
URL:
https://aclanthology.org/L16-1662
DOI:
Bibkey:
Cite (ACL):
Alexandre Bérard, Christophe Servan, Olivier Pietquin, and Laurent Besacier. 2016. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4188–4192, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP (Bérard et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1662.pdf
Code
 eske/multivec