Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package

Ajay Patel, Alexander Sands, Chris Callison-Burch, Marianna Apidianaki


Abstract
Vector space embedding models like word2vec, GloVe, and fastText are extremely popular representations in natural language processing (NLP) applications. We present Magnitude, a fast, lightweight tool for utilizing and processing embeddings. Magnitude is an open source Python package with a compact vector storage file format that allows for efficient manipulation of huge numbers of embeddings. Magnitude performs common operations up to 60 to 6,000 times faster than Gensim. Magnitude introduces several novel features for improved robustness like out-of-vocabulary lookups.
Anthology ID:
D18-2021
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
Eduardo Blanco, Wei Lu
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
120–126
Language:
URL:
https://aclanthology.org/D18-2021/
DOI:
10.18653/v1/D18-2021
Bibkey:
Cite (ACL):
Ajay Patel, Alexander Sands, Chris Callison-Burch, and Marianna Apidianaki. 2018. Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 120–126, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package (Patel et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-2021.pdf
Code
 plasticityai/magnitude