Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection

Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert


Abstract
One of the long-standing challenges in lexical semantics consists in learning representations of words which reflect their semantic properties. The remarkable success of word embeddings for this purpose suggests that high-quality representations can be obtained by summarizing the sentence contexts of word mentions. In this paper, we propose a method for learning word representations that follows this basic strategy, but differs from standard word embeddings in two important ways. First, we take advantage of contextualized language models (CLMs) rather than bags of word vectors to encode contexts. Second, rather than learning a word vector directly, we use a topic model to partition the contexts in which words appear, and then learn different topic-specific vectors for each word. Finally, we use a task-specific supervision signal to make a soft selection of the resulting vectors. We show that this simple strategy leads to high-quality word vectors, which are more predictive of semantic properties than word embeddings and existing CLM-based strategies.
Anthology ID:
2021.repl4nlp-1.19
Volume:
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Anna Rogers, Iacer Calixto, Ivan Vulić, Naomi Saphra, Nora Kassner, Oana-Maria Camburu, Trapit Bansal, Vered Shwartz
Venue:
RepL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
185–194
Language:
URL:
https://aclanthology.org/2021.repl4nlp-1.19
DOI:
10.18653/v1/2021.repl4nlp-1.19
Bibkey:
Cite (ACL):
Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke, and Steven Schockaert. 2021. Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 185–194, Online. Association for Computational Linguistics.
Cite (Informal):
Deriving Word Vectors from Contextualized Language Models using Topic-Aware Mention Selection (Wang et al., RepL4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.repl4nlp-1.19.pdf
Code
 Activeyixiao/topic-specific-vector