On the Dimensionality of Sentence Embeddings

Hongwei Wang, Hongming Zhang, Dong Yu


Abstract
Learning sentence embeddings is a fundamental problem in natural language processing. While existing research primarily focuses on enhancing the quality of sentence embeddings, the exploration of sentence embedding dimensions is limited. Here we present a comprehensive and empirical analysis of the dimensionality of sentence embeddings. First, we demonstrate that the optimal dimension of sentence embeddings is usually smaller than the default value. Subsequently, to compress the dimension of sentence embeddings with minimum performance degradation, we identify two components contributing to the overall performance loss: the encoder’s performance loss and the pooler’s performance loss. Therefore, we propose a two-step training method for sentence representation learning models, wherein the encoder and the pooler are optimized separately to mitigate the overall performance loss in low-dimension scenarios. Experimental results on seven STS tasks and seven sentence classification tasks demonstrate that our method significantly improves the performance of low-dimensional sentence embeddings.
Anthology ID:
2023.findings-emnlp.694
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10344–10354
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.694
DOI:
10.18653/v1/2023.findings-emnlp.694
Bibkey:
Cite (ACL):
Hongwei Wang, Hongming Zhang, and Dong Yu. 2023. On the Dimensionality of Sentence Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10344–10354, Singapore. Association for Computational Linguistics.
Cite (Informal):
On the Dimensionality of Sentence Embeddings (Wang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.694.pdf