Self-supervised Post-processing Method to Enrich Pretrained Word Vectors

Hwiyeol Jo


Abstract
Retrofitting techniques, which inject external resources into word representations, have compensated for the weakness of distributed representations in semantic and relational knowledge between words. However, the previous methods require additional external resources and strongly depend on the lexicon. To address the issues, we propose a simple extension of extrofitting, self-supervised extrofitting: extrofitting by its own word vector distribution. Our methods improve the vanilla embeddings on all of word similarity tasks without any external resources. Moreover, the method is also effective in various languages, which implies that our method will be useful in lexicon-scarce languages. As downstream tasks, we show its benefits in dialogue state tracking and text classification tasks, reporting better and generalized results compared to other word vector specialization methods.
Anthology ID:
2023.findings-emnlp.54
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
747–757
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.54
DOI:
10.18653/v1/2023.findings-emnlp.54
Bibkey:
Cite (ACL):
Hwiyeol Jo. 2023. Self-supervised Post-processing Method to Enrich Pretrained Word Vectors. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 747–757, Singapore. Association for Computational Linguistics.
Cite (Informal):
Self-supervised Post-processing Method to Enrich Pretrained Word Vectors (Jo, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.54.pdf