BERT Has Uncommon Sense: Similarity Ranking for Word Sense BERTology

Luke Gessler, Nathan Schneider


Abstract
An important question concerning contextualized word embedding (CWE) models like BERT is how well they can represent different word senses, especially those in the long tail of uncommon senses. Rather than build a WSD system as in previous work, we investigate contextualized embedding neighborhoods directly, formulating a query-by-example nearest neighbor retrieval task and examining ranking performance for words and senses in different frequency bands. In an evaluation on two English sense-annotated corpora, we find that several popular CWE models all outperform a random baseline even for proportionally rare senses, without explicit sense supervision. However, performance varies considerably even among models with similar architectures and pretraining regimes, with especially large differences for rare word senses, revealing that CWE models are not all created equal when it comes to approximating word senses in their native representations.
Anthology ID:
2021.blackboxnlp-1.43
Volume:
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
BlackboxNLP | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
539–547
Language:
URL:
https://aclanthology.org/2021.blackboxnlp-1.43
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.blackboxnlp-1.43.pdf
Code
 lgessler/bert-has-uncommon-sense
Data
GLUE