Souleiman Hasan


2017

pdf bib
Word Re-Embedding via Manifold Dimensionality Retention
Souleiman Hasan | Edward Curry
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words co-occurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 - 5.0% points depending on the original space.
Search
Co-authors
Venues