Zhengxin Zeng
2025
Alleviating Performance Degradation Caused by Out-of-Distribution Issues in Embedding-Based Retrieval
Haotong Bao
|
Jianjin Zhang
|
Qi Chen
|
Weihao Han
|
Zhengxin Zeng
|
Ruiheng Chang
|
Mingzheng Li
|
Hao Sun
|
Weiwei Deng
|
Feng Sun
|
Qi Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
In Embedding Based Retrieval (EBR), Approximate Nearest Neighbor (ANN) algorithms are widely adopted for efficient large-scale search. However, recent studies reveal a query out-of-distribution (OOD) issue, where query and base embeddings follow mismatched distributions, significantly degrading ANN performance. In this work, we empirically verify the generality of this phenomenon and provide a quantitative analysis. To mitigate the distributional gap, we introduce a distribution regularizer into the encoder training objective, encouraging alignment between query and base embeddings. Extensive experiments across multiple datasets, encoders, and ANN indices show that our method consistently improves retrieval performance.
Search
Fix author
Co-authors
- Haotong Bao 1
- Ruiheng Chang 1
- Qi Chen 1
- Weiwei Deng 1
- Weihao Han 1
- show all...