Word associations and the distance properties of context-aware word embeddings

Maria A. Rodriguez, Paola Merlo


Abstract
What do people know when they know the meaning of words? Word associations have been widely used to tap into lexical repre- sentations and their structure, as a way of probing semantic knowledge in humans. We investigate whether current word embedding spaces (contextualized and uncontextualized) can be considered good models of human lexi- cal knowledge by studying whether they have comparable characteristics to human associa- tion spaces. We study the three properties of association rank, asymmetry of similarity and triangle inequality. We find that word embeddings are good mod- els of some word associations properties. They replicate well human associations between words, and, like humans, their context-aware variants show violations of the triangle in- equality. While they do show asymmetry of similarities, their asymmetries do not map those of human association norms.
Anthology ID:
2020.conll-1.30
Volume:
Proceedings of the 24th Conference on Computational Natural Language Learning
Month:
November
Year:
2020
Address:
Online
Editors:
Raquel Fernández, Tal Linzen
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
376–385
Language:
URL:
https://aclanthology.org/2020.conll-1.30
DOI:
10.18653/v1/2020.conll-1.30
Bibkey:
Cite (ACL):
Maria A. Rodriguez and Paola Merlo. 2020. Word associations and the distance properties of context-aware word embeddings. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 376–385, Online. Association for Computational Linguistics.
Cite (Informal):
Word associations and the distance properties of context-aware word embeddings (A. Rodriguez & Merlo, CoNLL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.conll-1.30.pdf
Optional supplementary material:
 2020.conll-1.30.OptionalSupplementaryMaterial.zip
Data
BookCorpus