WordTies: Measuring Word Associations in Language Models via Constrained Sampling

Peiran Yao, Tobias Renwick, Denilson Barbosa


Abstract
Word associations are widely used in psychology to provide insights on how humans perceive and understand concepts. Comparing word associations in language models (LMs) to those generated by human subjects can serve as a proxy to uncover embedded lexical and commonsense knowledge in language models. While much helpful work has been done applying direct metrics, such as cosine similarity, to help understand latent spaces, these metrics are symmetric, while human word associativity is asymmetric. We propose WordTies, an algorithm based on constrained sampling from LMs, which allows an asymmetric measurement of associated words, given a cue word as the input. Comparing to existing methods, word associations found by this method share more overlap with associations provided by humans, and observe the asymmetric property of human associations. To examine possible reasons behind associations, we analyze the knowledge and reasoning behind the word pairings as they are linked to lexical and commonsense knowledge graphs.When the knowledge about the nature of the word pairings is combined with a probability that the LM has learned that information, we have a new way to examine what information is captured in LMs.
Anthology ID:
2022.findings-emnlp.440
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5959–5970
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.440
DOI:
10.18653/v1/2022.findings-emnlp.440
Bibkey:
Cite (ACL):
Peiran Yao, Tobias Renwick, and Denilson Barbosa. 2022. WordTies: Measuring Word Associations in Language Models via Constrained Sampling. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5959–5970, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
WordTies: Measuring Word Associations in Language Models via Constrained Sampling (Yao et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.440.pdf
Software:
 2022.findings-emnlp.440.software.zip
Video:
 https://aclanthology.org/2022.findings-emnlp.440.mp4