Tasker Hull
2020
comp-syn: Perceptually Grounded Word Embeddings with Color
Bhargav Srinivasa Desikan
|
Tasker Hull
|
Ethan Nadler
|
Douglas Guilbeault
|
Aabir Abubakar Kar
|
Mark Chu
|
Donald Ruggiero Lo Sardo
Proceedings of the 28th International Conference on Computational Linguistics
Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that(1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word–color embeddings ,and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word–color embeddings forover 40,000 English words, each associated with crowd-sourced word concreteness judgments.
Search
Fix data
Co-authors
- Aabir Abubakar Kar 1
- Mark Chu 1
- Douglas Guilbeault 1
- Donald Ruggiero Lo Sardo 1
- Ethan Nadler 1
- show all...