Further Compressing Distilled Language Models via Frequency-aware Partial Sparse Coding of Embeddings

Kohki Tamura, Naoki Yoshinaga, Masato Neishi


Abstract
Although pre-trained language models (PLMs) are effective for natural language understanding (NLU) tasks, they demand a huge computational resource, thus preventing us from deploying them on edge devices. Researchers have therefore applied compression techniques for neural networks, such as pruning, quantization, and knowledge distillation, to the PLMs. Although these generic techniques can reduce the number of internal parameters of hidden layers in the PLMs, the embedding layers tied to the tokenizer arehard to compress, occupying a non-negligible portion of the compressed model. In this study, aiming to further compress PLMs reduced by the generic techniques, we exploit frequency-aware sparse coding to compress the embedding layers of the PLMs fine-tuned to downstream tasks. To minimize the impact of the compression on the accuracy, we retain the embeddings of common tokens as they are and use them to reconstruct embeddings of rare tokens by locally linear mapping. Experimental results on the GLUE and JGLUE benchmarks for language understanding in English and Japanese confirmed that our method can further compress the fine-tuned DistilBERT models models while maintaining accuracy.
Anthology ID:
2024.conll-1.29
Volume:
Proceedings of the 28th Conference on Computational Natural Language Learning
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Libby Barak, Malihe Alikhani
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
388–399
Language:
URL:
https://aclanthology.org/2024.conll-1.29
DOI:
Bibkey:
Cite (ACL):
Kohki Tamura, Naoki Yoshinaga, and Masato Neishi. 2024. Further Compressing Distilled Language Models via Frequency-aware Partial Sparse Coding of Embeddings. In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 388–399, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Further Compressing Distilled Language Models via Frequency-aware Partial Sparse Coding of Embeddings (Tamura et al., CoNLL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.conll-1.29.pdf