AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Lei Li; Xiangxu Zhang; Xiao Zhou; Zheng Liu

AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Lei Li, Xiangxu Zhang, Xiao Zhou, Zheng Liu

Abstract

Medical information retrieval (MIR) is vital for accessing knowledge from electronic health records, scientific literature, and medical databases, supporting applications such as medical education, patient queries, and clinical diagnosis. However, effective zero-shot dense retrieval in the medical domain remains difficult due to the scarcity of relevance-labeled data. To address this challenge, we propose **S**elf-**L**earning **Hy**pothetical **D**ocument **E**mbeddings (**SL-HyDE**), a framework that leverages large language models (LLMs) to generate hypothetical documents conditioned on a query. These documents encapsulate essential medical context, guiding dense retrievers toward the most relevant results. SL-HyDE further employs a self-learning mechanism that iteratively improves pseudo-document generation and retrieval using unlabeled corpora, eliminating the need for labeled data. In addition, we introduce the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation suite reflecting real-world medical scenarios, comprising five tasks and ten datasets. By benchmarking ten models on CMIRB, we provide a rigorous standard for evaluating MIR systems. Experimental results demonstrate that SL-HyDE significantly outperforms HyDE in retrieval accuracy, while exhibiting strong generalization and scalability across diverse LLM and retriever configurations. Our code and data are publicly available at: https://github.com/ll0ruc/AutoMIR.

Anthology ID:: 2025.findings-emnlp.1305
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24028–24047
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1305/
DOI:
Bibkey:
Cite (ACL):: Lei Li, Xiangxu Zhang, Xiao Zhou, and Zheng Liu. 2025. AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24028–24047, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels (Li et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1305.pdf
Checklist:: 2025.findings-emnlp.1305.checklist.pdf

PDF Cite Search Checklist Fix data