Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings

Thomas Alexander Trost, Dietrich Klakow


Abstract
Word embeddings are high-dimensional vector representations of words and are thus difficult to interpret. In order to deal with this, we introduce an unsupervised parameter free method for creating a hierarchical graphical clustering of the full ensemble of word vectors and show that this structure is a geometrically meaningful representation of the original relations between the words. This newly obtained representation can be used for better understanding and thus improving the embedding algorithm and exhibits semantic meaning, so it can also be utilized in a variety of language processing tasks like categorization or measuring similarity.
Anthology ID:
W17-2404
Volume:
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing
Month:
August
Year:
2017
Address:
Vancouver, Canada
Editors:
Martin Riedl, Swapna Somasundaran, Goran Glavaš, Eduard Hovy
Venue:
TextGraphs
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30–38
Language:
URL:
https://aclanthology.org/W17-2404/
DOI:
10.18653/v1/W17-2404
Bibkey:
Cite (ACL):
Thomas Alexander Trost and Dietrich Klakow. 2017. Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings. In Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing, pages 30–38, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings (Trost & Klakow, TextGraphs 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-2404.pdf