%0 Conference Proceedings %T Information-Theory Interpretation of the Skip-Gram Negative-Sampling Objective Function %A Melamud, Oren %A Goldberger, Jacob %Y Barzilay, Regina %Y Kan, Min-Yen %S Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) %D 2017 %8 July %I Association for Computational Linguistics %C Vancouver, Canada %F melamud-goldberger-2017-information %X In this paper we define a measure of dependency between two random variables, based on the Jensen-Shannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skip-gram with negative sampling embedding algorithm finds the optimal low-dimensional approximation of this JS dependency measure between the words and their contexts. The gap between the optimal score and the low-dimensional approximation is demonstrated on a standard text corpus. %R 10.18653/v1/P17-2026 %U https://aclanthology.org/P17-2026 %U https://doi.org/10.18653/v1/P17-2026 %P 167-171