ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Deli Zhao, Wenbing Huang, Tingyang Xu, Qifeng Bai, Yu Rong


Abstract
Reasoning-based large language models have excelled in mathematics and programming, yet their potential in knowledge-intensive medical question answering remains underexplored and insufficiently validated in clinical contexts.To bridge this gap, we introduce ReasonMed, the largest medical reasoning dataset to date, comprising 370k high-quality examples distilled from 1.75 million initial reasoning paths generated by complementary LLMs and curated through a cost-efficient easy-medium-difficult (EMD) pipeline.ReasonMed is built through a multi-agent generation, verification, and refinement process, in which an Error Refiner improves reasoning paths by correcting error-prone steps identified by a verifier.Using ReasonMed, we investigate effective strategies for training medical reasoning models and find that integrating detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning results.Models trained on ReasonMed set a new benchmark: ReasonMed-7B surpasses the prior best sub-10B models by 4.17% and even exceeds LLaMA3.1-70B on PubMedQA by 4.60%. When scaled to ReasonMed-14B, it remains highly competitive, underscoring consistent scaling potential.The codes and datasets are available at https://github.com/YuSun-Work/ReasonMed.
Anthology ID:
2025.emnlp-main.1344
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26457–26478
Language:
URL:
https://aclanthology.org/2025.emnlp-main.1344/
DOI:
Bibkey:
Cite (ACL):
Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Deli Zhao, Wenbing Huang, Tingyang Xu, Qifeng Bai, and Yu Rong. 2025. ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26457–26478, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning (Sun et al., EMNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.emnlp-main.1344.pdf
Checklist:
 2025.emnlp-main.1344.checklist.pdf