GGP: Glossary Guided Post-processing for Word Embedding Learning

Ruosong Yang, Jiannong Cao, Zhiyuan Wen


Abstract
Word embedding learning is the task to map each word into a low-dimensional and continuous vector based on a large corpus. To enhance corpus based word embedding models, researchers utilize domain knowledge to learn more distinguishable representations via joint optimization and post-processing based models. However, joint optimization based models require much training time. Existing post-processing models mostly consider semantic knowledge while learned embedding models show less functional information. Glossary is a comprehensive linguistic resource. And in previous works, the glossary is usually used to enhance the word representations via joint optimization based methods. In this paper, we post-process pre-trained word embedding models with incorporating the glossary and capture more topical and functional information. We propose GGP (Glossary Guided Post-processing word embedding) model which consists of a global post-processing function to fine-tune each word vector, and an auto-encoding model to learn sense representations, furthermore, constrains each post-processed word representation and the composition of its sense representations to be similar. We evaluate our model by comparing it with two state-of-the-art models on six word topical/functional similarity datasets, and the results show that it outperforms competitors by an average of 4.1% across all datasets. And our model outperforms GloVe by more than 7%.
Anthology ID:
2020.lrec-1.581
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4726–4730
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.581
DOI:
Bibkey:
Cite (ACL):
Ruosong Yang, Jiannong Cao, and Zhiyuan Wen. 2020. GGP: Glossary Guided Post-processing for Word Embedding Learning. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4726–4730, Marseille, France. European Language Resources Association.
Cite (Informal):
GGP: Glossary Guided Post-processing for Word Embedding Learning (Yang et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.581.pdf