CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval

Ang Li; Yiquan Wu; Yinghao Hu; Lizhi Qing; Shihang Wang; Chengyuan Liu; Tao Wu; Adam Jatowt; Ming Cai; Fei Wu; Kun Kuang

CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval

Ang Li, Yiquan Wu, Yinghao Hu, Lizhi Qing, Shihang Wang, Chengyuan Liu, Tao Wu, Adam Jatowt, Ming Cai, Fei Wu, Kun Kuang

Abstract

Information retrieval in specialized domains (e.g., legal and medical) faces challenges in aligning user queries, often expressed in colloquial language, with highly structured, terminology-rich documents. This discrepancy creates a distribution gap in the text representation. Recent methods aim to enhance queries by generating intermediary elements (e.g., keywords, pseudo-documents) before performing retrieval with large language models (LLMs). However, by treating LLMs and retrievers separately, these approaches risk producing unreliable or irrelevant intermediaries, which can significantly degrade retrieval performance. To address this issue, we propose CoEvo, an alternating optimization framework that facilitates the coevolution of LLMs and retrieval models. CoEvo operates through two key steps: L-step directs the LLM in generating intermediaries by leveraging an archive of historical examples known to enhance retrieval. R-step trains the retriever using contrastive learning on the intermediaries produced by the LLM. Finally, we evaluate and flexibly leverage content generated by the LLM to amplify the effectiveness of coevolution. Experimental results demonstrate significant improvements in retrieval performance across both legal and medical domains.

Anthology ID:: 2025.emnlp-main.757
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14991–15010
Language:
URL:: https://aclanthology.org/2025.emnlp-main.757/
DOI:
Bibkey:
Cite (ACL):: Ang Li, Yiquan Wu, Yinghao Hu, Lizhi Qing, Shihang Wang, Chengyuan Liu, Tao Wu, Adam Jatowt, Ming Cai, Fei Wu, and Kun Kuang. 2025. CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 14991–15010, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval (Li et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.757.pdf
Checklist:: 2025.emnlp-main.757.checklist.pdf

PDF Cite Search Checklist Fix data