Transparent, Efficient, and Robust Word Embedding Access with WOMBAT

Mark-Christoph Müller, Michael Strube


Abstract
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.
Anthology ID:
C18-2012
Volume:
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico
Editor:
Dongyan Zhao
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–57
Language:
URL:
https://aclanthology.org/C18-2012/
DOI:
Bibkey:
Cite (ACL):
Mark-Christoph Müller and Michael Strube. 2018. Transparent, Efficient, and Robust Word Embedding Access with WOMBAT. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 53–57, Santa Fe, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Transparent, Efficient, and Robust Word Embedding Access with WOMBAT (Müller & Strube, COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-2012.pdf
Code
 nlpAThits/WOMBAT