Going Beyond T-SNE: Exposing whatlies in Text Embeddings

Vincent Warmerdam, Thomas Kober, Rachael Tatman


Abstract
We introduce whatlies, an open source toolkit for visually inspecting word and sentence embeddings. The project offers a unified and extensible API with current support for a range of popular embedding backends including spaCy, tfhub, huggingface transformers, gensim, fastText and BytePair embeddings. The package combines a domain specific language for vector arithmetic with visualisation tools that make exploring word embeddings more intuitive and concise. It offers support for many popular dimensionality reduction techniques as well as many interactive visualisations that can either be statically exported or shared via Jupyter notebooks. The project documentation is available from https://rasahq.github.io/whatlies/.
Anthology ID:
2020.nlposs-1.8
Volume:
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)
Month:
November
Year:
2020
Address:
Online
Editors:
Eunjeong L. Park, Masato Hagiwara, Dmitrijs Milajevs, Nelson F. Liu, Geeticka Chauhan, Liling Tan
Venue:
NLPOSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–60
Language:
URL:
https://aclanthology.org/2020.nlposs-1.8
DOI:
10.18653/v1/2020.nlposs-1.8
Bibkey:
Cite (ACL):
Vincent Warmerdam, Thomas Kober, and Rachael Tatman. 2020. Going Beyond T-SNE: Exposing whatlies in Text Embeddings. In Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pages 52–60, Online. Association for Computational Linguistics.
Cite (Informal):
Going Beyond T-SNE: Exposing whatlies in Text Embeddings (Warmerdam et al., NLPOSS 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.nlposs-1.8.pdf
Video:
 https://slideslive.com/38939745