RPD: A Distance Function Between Word Embeddings
Xuhui
Zhou
author
Shujian
Huang
author
Zaixiang
Zheng
author
2020-07
text
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Shruti
Rijhwani
editor
Jiangming
Liu
editor
Yizhong
Wang
editor
Rotem
Dror
editor
Association for Computational Linguistics
Online
conference publication
It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of em-beddings deviate from each other. In this paper, we propose a novel metric called Relative Pairwise Inner Product Distance (RPD) to quantify the distance between different sets of word embeddings. This unitary-invariant metric has a unified scale for comparing different sets of word embeddings. Based on the properties of RPD, we study the relations of word embeddings of different algorithms systematically and investigate the influence of different training processes and corpora. The results shed light on the poorly understood word embeddings and justify RPD as a measure of the distance of embedding space.
zhou-etal-2020-rpd
10.18653/v1/2020.acl-srw.7
https://aclanthology.org/2020.acl-srw.7
2020-07
42
50