UnClE: Explicitly Leveraging Semantic Similarity to Reduce the Parameters of Word Embeddings

Zhi Li, Yuchen Zhai, Chengyu Wang, Minghui Qiu, Kailiang Li, Yin Zhang


Abstract
Natural language processing (NLP) models often require a massive number of parameters for word embeddings, which limits their application on mobile devices. Researchers have employed many approaches, e.g. adaptive inputs, to reduce the parameters of word embeddings. However, existing methods rarely pay attention to semantic information. In this paper, we propose a novel method called Unique and Class Embeddings (UnClE), which explicitly leverages semantic similarity with weight sharing to reduce the dimensionality of word embeddings. Inspired by the fact that words with similar semantic can share a part of weights, we divide the embeddings of words into two parts: unique embedding and class embedding. The former is one-to-one mapping like traditional embedding, while the latter is many-to-one mapping and learn the representation of class information. Our method is suitable for both word-level and sub-word level models and can be used to reduce both input and output embeddings. Experimental results on the standard WMT 2014 English-German dataset show that our method is able to reduce the parameters of word embeddings by more than 11x, with about 93% performance retaining in BLEU metrics. For language modeling task, our model can reduce word embeddings by 6x or 11x on PTB/WT2 dataset at the cost of a certain degree of performance degradation.
Anthology ID:
2021.findings-emnlp.156
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1815–1828
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.156
DOI:
10.18653/v1/2021.findings-emnlp.156
Bibkey:
Cite (ACL):
Zhi Li, Yuchen Zhai, Chengyu Wang, Minghui Qiu, Kailiang Li, and Yin Zhang. 2021. UnClE: Explicitly Leveraging Semantic Similarity to Reduce the Parameters of Word Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1815–1828, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
UnClE: Explicitly Leveraging Semantic Similarity to Reduce the Parameters of Word Embeddings (Li et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.156.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.156.mp4
Data
WikiText-2