Do We Really Need All Those Dimensions? An Intrinsic Evaluation Framework for Compressed Embeddings

Nathan Inkiriwang; Necva Bölücü; Garth Tarr; Maciej Rybinski

Do We Really Need All Those Dimensions? An Intrinsic Evaluation Framework for Compressed Embeddings

Nathan Inkiriwang, Necva Bölücü, Garth Tarr, Maciej Rybinski

Abstract

High-dimensional text embeddings are foundational to modern NLP but costly to store and use. While embedding compression addresses these challenges, selecting the best compression method remains difficult. Existing evaluation methods for compressed embeddings are either expensive or too simplistic. We introduce a comprehensive intrinsic evaluation framework featuring a suite of task-agnostic metrics that together provide a reliable proxy for downstream performance. A key contribution is \operatorname{EOS}_k, a novel spectral fidelity measure specifically designed to be robust to embedding anisotropy. Through extensive experiments on diverse embeddings across four downstream tasks, we demonstrate that our intrinsic metrics reliably predict extrinsic performance and reveal how different embedding architectures depend on distinct geometric properties. Our framework provides a practical, efficient, and interpretable alternative to standard evaluations for compressed embeddings.

Anthology ID:: 2025.findings-emnlp.717
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13305–13323
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.717/
DOI:
Bibkey:
Cite (ACL):: Nathan Inkiriwang, Necva Bölücü, Garth Tarr, and Maciej Rybinski. 2025. Do We Really Need All Those Dimensions? An Intrinsic Evaluation Framework for Compressed Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 13305–13323, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Do We Really Need All Those Dimensions? An Intrinsic Evaluation Framework for Compressed Embeddings (Inkiriwang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.717.pdf
Checklist:: 2025.findings-emnlp.717.checklist.pdf

PDF Cite Search Checklist Fix data