Putting Words in BERT’s Mouth: Navigating Contextualized Vector Spaces with Pseudowords

Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend, Vivek Srikumar


Abstract
We present a method for exploring regions around individual points in a contextualized vector space (particularly, BERT space), as a way to investigate how these regions correspond to word senses. By inducing a contextualized “pseudoword” vector as a stand-in for a static embedding in the input layer, and then performing masked prediction of a word in the sentence, we are able to investigate the geometry of the BERT-space in a controlled manner around individual instances. Using our method on a set of carefully constructed sentences targeting highly ambiguous English words, we find substantial regularity in the contextualized space, with regions that correspond to distinct word senses; but between these regions there are occasionally “sense voids”—regions that do not correspond to any intelligible sense.
Anthology ID:
2021.emnlp-main.806
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10300–10313
Language:
URL:
https://aclanthology.org/2021.emnlp-main.806
DOI:
10.18653/v1/2021.emnlp-main.806
Bibkey:
Cite (ACL):
Taelin Karidi, Yichu Zhou, Nathan Schneider, Omri Abend, and Vivek Srikumar. 2021. Putting Words in BERT’s Mouth: Navigating Contextualized Vector Spaces with Pseudowords. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10300–10313, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Putting Words in BERT’s Mouth: Navigating Contextualized Vector Spaces with Pseudowords (Karidi et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.806.pdf
Code
 tai314159/pwibm-putting-words-in-bert-s-mouth