Multi-Sense Embeddings for Language Models and Knowledge Distillation

Qitong Wang; Mohammed J Zaki; Georgios Kollias; Vasileios Kalantzis

doi:10.18653/v1/2025.findings-acl.691

Multi-Sense Embeddings for Language Models and Knowledge Distillation

Qitong Wang, Mohammed J Zaki, Georgios Kollias, Vasileios Kalantzis

Abstract

Transformer-based large language models (LLMs) rely on contextual embeddings which generate different (continuous) representations for the same token depending on its surrounding context. Nonetheless, words and tokens typically have a limited number of senses (or meanings). We propose multi-sense embeddings as a drop-in replacement for each token in order to capture the range of their uses in a language. To construct a sense embedding dictionary, we apply a clustering algorithm to embeddings generated by an LLM and consider the cluster centers as representative sense embeddings. In addition, we propose a novel knowledge distillation method that leverages the sense dictionary to learn a smaller student model that mimics the senses from the much larger base LLM model, offering significant space and inference time savings, while maintaining competitive performance. Via thorough experiments on various benchmarks, we showcase the effectiveness of our sense embeddings and knowledge distillation approach.

Anthology ID:: 2025.findings-acl.691
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13353–13369
Language:
URL:: https://aclanthology.org/2025.findings-acl.691/
DOI:: 10.18653/v1/2025.findings-acl.691
Bibkey:
Cite (ACL):: Qitong Wang, Mohammed J Zaki, Georgios Kollias, and Vasileios Kalantzis. 2025. Multi-Sense Embeddings for Language Models and Knowledge Distillation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 13353–13369, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Multi-Sense Embeddings for Language Models and Knowledge Distillation (Wang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.691.pdf

PDF Cite Search Fix data