Emotion Enriched Retrofitted Word Embeddings

Sapan Shah, Sreedhar Reddy, Pushpak Bhattacharyya


Abstract
Word embeddings learned using the distributional hypothesis (e.g., GloVe, Word2vec) are good at encoding various lexical-semantic relations. However, they do not capture the emotion aspects of words. We present a novel retrofitting method for updating the vectors of emotion bearing words like fun, offence, angry, etc. The retrofitted embeddings achieve better inter-cluster and intra-cluster distance for words having the same emotions, e.g., the joy cluster containing words like fun, happiness, etc., and the anger cluster with words like offence, rage, etc., as evaluated through different cluster quality metrics. For the downstream tasks on sentiment analysis and sarcasm detection, simple classification models, such as SVM and Attention Net, learned using our retrofitted embeddings perform better than their pre-trained counterparts (about 1.5 % improvement in F1-score) as well as other benchmarks. Furthermore, the difference in performance is more pronounced in the limited data setting.
Anthology ID:
2022.coling-1.363
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4136–4148
Language:
URL:
https://aclanthology.org/2022.coling-1.363
DOI:
Bibkey:
Cite (ACL):
Sapan Shah, Sreedhar Reddy, and Pushpak Bhattacharyya. 2022. Emotion Enriched Retrofitted Word Embeddings. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4136–4148, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Emotion Enriched Retrofitted Word Embeddings (Shah et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.363.pdf
Data
MUStARD++