The Influence of Down-Sampling Strategies on SVD Word Embedding Stability

Johannes Hellrich, Bernd Kampe, Udo Hahn


Abstract
The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVD-PPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embedding
Anthology ID:
W19-2003
Volume:
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
Month:
June
Year:
2019
Address:
Minneapolis, USA
Editors:
Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Yoav Goldberg
Venue:
RepEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18–26
Language:
URL:
https://aclanthology.org/W19-2003/
DOI:
10.18653/v1/W19-2003
Bibkey:
Cite (ACL):
Johannes Hellrich, Bernd Kampe, and Udo Hahn. 2019. The Influence of Down-Sampling Strategies on SVD Word Embedding Stability. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 18–26, Minneapolis, USA. Association for Computational Linguistics.
Cite (Informal):
The Influence of Down-Sampling Strategies on SVD Word Embedding Stability (Hellrich et al., RepEval 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2003.pdf