More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

Chunsheng Zuo; Daniel Khashabi

More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

Abstract

Dense retrievers powered by pretrained embeddings are widely used for document retrieval but struggle in specialized domains due to the mismatches between the training and target domain distributions. Domain adaptation typically requires costly annotation and retraining of query-document pairs. In this work, we revisit an overlooked alternative: applying PCA to domain embeddings to derive lower-dimensional representations that preserve domain-relevant features while discarding non-discriminative components. Though traditionally used for efficiency, we demonstrate that this simple embedding compression can effectively improve retrieval performance. Evaluated across 9 retrievers and 14 MTEB datasets, PCA applied solely to query embeddings improves NDCG@10 in 75.4% of model-dataset pairs, offering a simple and lightweight method for domain adaptation.

Anthology ID:: 2026.surgellm-1.24
Volume:: Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Vivek Gupta, Kaize Ding, Harsha Kokel, Yue Zhao, Amit Agarwal, Yu Wang, Michael Glass, Yu Zhang, Kavitha Srinivas, Xiusi Chen, Oktie Hassanzadeh, Qi Zhu, Shuaichen Chang, Yuan Luo
Venues:: SURGeLLM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 361–377
Language:
URL:: https://aclanthology.org/2026.surgellm-1.24/
DOI:
Bibkey:
Cite (ACL):: Chunsheng Zuo and Daniel Khashabi. 2026. More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval. In Proceedings of the First Workshop on Structured Understanding, Retrieval, and Generation in the LLM Era (SURGeLLM 2026), pages 361–377, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval (Zuo & Khashabi, SURGeLLM 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.surgellm-1.24.pdf

PDF Cite Search Fix data