Integrating Lexical Knowledge in Word Embeddings using Sprinkling and Retrofitting

Aakash Srinivasan, Harshavardhan Kamarthi, Devi Ganesan, Sutanu Chakraborti


Abstract
Neural network based word embeddings, such as Word2Vec and Glove, are purely data driven in that they capture the distributional information about words from the training corpus. Past works have attempted to improve these embeddings by incorporating semantic knowledge from lexical resources like WordNet. Some techniques like retrofitting modify word embeddings in the post-processing stage while some others use a joint learning approach by modifying the objective function of neural networks. In this paper, we discuss two novel approaches for incorporating semantic knowledge into word embeddings. In the first approach, we take advantage of Levy et al’s work which showed that using SVD based methods on co-occurrence matrix provide similar performance to neural network based embeddings. We propose a ‘sprinkling’ technique to add semantic relations to the co-occurrence matrix directly before factorization. In the second approach, WordNet similarity scores are used to improve the retrofitting method. We evaluate the proposed methods in both intrinsic and extrinsic tasks and observe significant improvements over the baselines in many of the datasets.
Anthology ID:
2019.icon-1.13
Volume:
Proceedings of the 16th International Conference on Natural Language Processing
Month:
December
Year:
2019
Address:
International Institute of Information Technology, Hyderabad, India
Editors:
Dipti Misra Sharma, Pushpak Bhattacharya
Venue:
ICON
SIG:
Publisher:
NLP Association of India
Note:
Pages:
115–123
Language:
URL:
https://aclanthology.org/2019.icon-1.13
DOI:
Bibkey:
Cite (ACL):
Aakash Srinivasan, Harshavardhan Kamarthi, Devi Ganesan, and Sutanu Chakraborti. 2019. Integrating Lexical Knowledge in Word Embeddings using Sprinkling and Retrofitting. In Proceedings of the 16th International Conference on Natural Language Processing, pages 115–123, International Institute of Information Technology, Hyderabad, India. NLP Association of India.
Cite (Informal):
Integrating Lexical Knowledge in Word Embeddings using Sprinkling and Retrofitting (Srinivasan et al., ICON 2019)
Copy Citation:
PDF:
https://aclanthology.org/2019.icon-1.13.pdf