Deconstructing word embedding algorithms

Kian Kenyon-Dean, Edward Newell, Jackie Chi Kit Cheung


Abstract
Word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Uncontextualized word embeddings are used in many NLP tasks today, especially in resource-limited settings where high memory capacity and GPUs are not available. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings. We believe that the theoretical findings in this paper can provide a basis for more informed development of future models.
Anthology ID:
2020.emnlp-main.681
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8479–8484
Language:
URL:
https://aclanthology.org/2020.emnlp-main.681
DOI:
10.18653/v1/2020.emnlp-main.681
Bibkey:
Cite (ACL):
Kian Kenyon-Dean, Edward Newell, and Jackie Chi Kit Cheung. 2020. Deconstructing word embedding algorithms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8479–8484, Online. Association for Computational Linguistics.
Cite (Informal):
Deconstructing word embedding algorithms (Kenyon-Dean et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.681.pdf
Video:
 https://slideslive.com/38938885