Spanish Word Vectors from Wikipedia

Mathias Etcheverry, Dina Wonsever


Abstract
Contents analisys from text data requires semantic representations that are difficult to obtain automatically, as they may require large handcrafted knowledge bases or manually annotated examples. Unsupervised autonomous methods for generating semantic representations are of greatest interest in face of huge volumes of text to be exploited in all kinds of applications. In this work we describe the generation and validation of semantic representations in the vector space paradigm for Spanish. The method used is GloVe (Pennington, 2014), one of the best performing reported methods , and vectors were trained over Spanish Wikipedia. The learned vectors evaluation is done in terms of word analogy and similarity tasks (Pennington, 2014; Baroni, 2014; Mikolov, 2013a). The vector set and a Spanish version for some widely used semantic relatedness tests are made publicly available.
Anthology ID:
L16-1584
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3681–3685
Language:
URL:
https://aclanthology.org/L16-1584/
DOI:
Bibkey:
Cite (ACL):
Mathias Etcheverry and Dina Wonsever. 2016. Spanish Word Vectors from Wikipedia. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3681–3685, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Spanish Word Vectors from Wikipedia (Etcheverry & Wonsever, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1584.pdf