Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings

Hayato Tsukagoshi; Ryohei Sasano

doi:10.18653/v1/2025.findings-acl.1330

Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings

Abstract

Prompt-based text embedding models, which generate task-specific embeddings upon receiving tailored prompts, have recently demonstrated remarkable performance. However, their resulting embeddings often have thousands of dimensions, leading to high storage costs and increased computational costs of embedding-based operations. In this paper, we investigate how post-hoc dimensionality reduction applied to the embeddings affects the performance of various tasks that leverage these embeddings, specifically classification, clustering, retrieval, and semantic textual similarity (STS) tasks. Our experiments show that even a naive dimensionality reduction, which keeps only the first 25% of the dimensions of the embeddings, results in a very slight performance degradation, indicating that these embeddings are highly redundant. Notably, for classification and clustering, even when embeddings are reduced to less than 0.5% of the original dimensionality the performance degradation is very small. To quantitatively analyze this redundancy, we perform an analysis based on the intrinsic dimensionality and isotropy of the embeddings. Our analysis reveals that embeddings for classification and clustering, which are considered to have very high dimensional redundancy, exhibit lower intrinsic dimensionality and less isotropy compared with those for retrieval and STS.

Anthology ID:: 2025.findings-acl.1330
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25915–25930
Language:
URL:: https://aclanthology.org/2025.findings-acl.1330/
DOI:: 10.18653/v1/2025.findings-acl.1330
Bibkey:
Cite (ACL):: Hayato Tsukagoshi and Ryohei Sasano. 2025. Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25915–25930, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings (Tsukagoshi & Sasano, Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1330.pdf

PDF Cite Search Fix data