RAW-C: Relatedness of Ambiguous Words in Context (A New Lexical Resource for English)

Sean Trott, Benjamin Bergen


Abstract
Most words are ambiguous—-i.e., they convey distinct meanings in different contexts—-and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how well these contextualized embeddings accommodate the more continuous, dynamic nature of word meaning—-particularly in a way that matches human intuitions. We introduce RAW-C, a dataset of graded, human relatedness judgments for 112 ambiguous words in context (with 672 sentence pairs total), as well as human estimates of sense dominance. The average inter-annotator agreement (assessed using a leave-one-annotator-out method) was 0.79. We then show that a measure of cosine distance, computed using contextualized embeddings from BERT and ELMo, correlates with human judgments, but that cosine distance also systematically underestimates how similar humans find uses of the same sense of a word to be, and systematically overestimates how similar humans find uses of different-sense homonyms. Finally, we propose a synthesis between psycholinguistic theories of the mental lexicon and computational models of lexical semantics.
Anthology ID:
2021.acl-long.550
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7077–7087
Language:
URL:
https://aclanthology.org/2021.acl-long.550
DOI:
10.18653/v1/2021.acl-long.550
Bibkey:
Cite (ACL):
Sean Trott and Benjamin Bergen. 2021. RAW-C: Relatedness of Ambiguous Words in Context (A New Lexical Resource for English). In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7077–7087, Online. Association for Computational Linguistics.
Cite (Informal):
RAW-C: Relatedness of Ambiguous Words in Context (A New Lexical Resource for English) (Trott & Bergen, ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.550.pdf
Video:
 https://aclanthology.org/2021.acl-long.550.mp4
Data
RAW-C