Let Retrievers Think Before Action: Thought-Augmented Embedding for Dense Retrieval

Ruiran Yan; Wen Xiong; Ze Liu; Chaozhuo Li; Hao Liao; Defu Lian; Zheng Liu

Let Retrievers Think Before Action: Thought-Augmented Embedding for Dense Retrieval

Ruiran Yan, Wen Xiong, Ze Liu, Chaozhuo Li, Hao Liao, Defu Lian, Zheng Liu

Abstract

Large language models (LLMs) have demonstrated that explicitly performing step-by-step thinking before producing final outputs can substantially improve performance on complex tasks, as exemplified by recent reasoning-oriented models such as OpenAI O1 and DeepSeek R1. Inspired by these advancements, we propose the O1 Embedder, a novel approach aiming to endow retrieval models with similar capabilities to address challenges like multi-task retrieval, zero-shot retrieval, and tasks requiring intensive reasoning of complex relationships. The O1 Embedder generates preliminary thoughts for input queries before document retrieval. To realize this objective, we address two fundamental challenges in integrating thinking mechanisms into dense retrieval. First, retrieval tasks lack explicit supervision for intermediate thinking processes, making it difficult to define thoughts that are truly useful for retrieval. We address this challenge with a data synthesis framework following an “Exploration-Refinement” process, ensuring alignment with retrieval utility. Second, effectively integrating thought generation with representation learning requires a unified modeling framework that can jointly support generation and embedding within a single model. O1 Embedder addresses this challenge by jointly optimizing thought generation and dense retrieval in an end-to-end manner, enhancing retrieval accuracy while reducing complexity through a single deployable model. Extensive evaluations across diverse datasets demonstrate significant performance improvements, highlighting the effectiveness and generalization capability of O1 Embedder.

Anthology ID:: 2026.findings-acl.1603
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32032–32052
Language:
URL:: https://aclanthology.org/2026.findings-acl.1603/
DOI:
Bibkey:
Cite (ACL):: Ruiran Yan, Wen Xiong, Ze Liu, Chaozhuo Li, Hao Liao, Defu Lian, and Zheng Liu. 2026. Let Retrievers Think Before Action: Thought-Augmented Embedding for Dense Retrieval. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32032–32052, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Let Retrievers Think Before Action: Thought-Augmented Embedding for Dense Retrieval (Yan et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1603.pdf
Checklist:: 2026.findings-acl.1603.checklist.pdf

PDF Cite Search Checklist Fix data