Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Liyan Xu; Zhenlin Su; Mo Yu; Jiangnan Li; Fandong Meng; Jie Zhou

doi:10.18653/v1/2025.findings-emnlp.1051

Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, Jie Zhou

Abstract

This work stems from an observed limitation of text encoders: embeddings may not be able to recognize fine-grained entities or events within encoded semantics, resulting in failed retrieval even in simple cases. To examine such behaviors, we first introduce a new evaluation dataset, CapRetrieval, in which passages are image captions and queries are phrases targeting entity or event concepts in diverse forms. Zero-shot evaluation suggests that encoders often struggle with these fine-grained matching, regardless of training sources or model size. Aiming for enhancement, we proceed to finetune encoders with our proposed data generation strategies, enabling a small 0.1B encoder to outperform the state-of-the-art 7B model. Within this process, we further uncover the granularity dilemma, a challenge for embeddings to capture fine-grained salience while aligning with overall semantics. Our dataset, code and models in this work are publicly released at https://github.com/lxucs/CapRetrieval.

Anthology ID:: 2025.findings-emnlp.1051
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19295–19305
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1051/
DOI:: 10.18653/v1/2025.findings-emnlp.1051
Bibkey:
Cite (ACL):: Liyan Xu, Zhenlin Su, Mo Yu, Jiangnan Li, Fandong Meng, and Jie Zhou. 2025. Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19295–19305, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings (Xu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1051.pdf
Checklist:: 2025.findings-emnlp.1051.checklist.pdf

PDF Cite Search Checklist Fix data