HG2Vec: Improved Word Embeddings from Dictionary and Thesaurus Based Heterogeneous Graph

Qitong Wang, Mohammed J Zaki


Abstract
Learning word embeddings is an essential topic in natural language processing. Most existing works use a vast corpus as a primary source while training, but this requires massive time and space for data pre-processing and model training. We propose a new model, HG2Vec, that learns word embeddings utilizing only dictionaries and thesauri. Our model reaches the state-of-art on multiple word similarity and relatedness benchmarks. We demonstrate that dictionaries and thesauri are effective resources to learn word embeddings. In addition, we exploit a new context-focused loss that models transitive relationships between word pairs and balances the performance between similarity and relatedness benchmarks, yielding superior results.
Anthology ID:
2022.coling-1.279
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3154–3163
Language:
URL:
https://aclanthology.org/2022.coling-1.279
DOI:
Bibkey:
Cite (ACL):
Qitong Wang and Mohammed J Zaki. 2022. HG2Vec: Improved Word Embeddings from Dictionary and Thesaurus Based Heterogeneous Graph. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3154–3163, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
HG2Vec: Improved Word Embeddings from Dictionary and Thesaurus Based Heterogeneous Graph (Wang & Zaki, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.279.pdf
Data
CARD-660