PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

James Burgess; Jan N. Hansen; Duo Peng; Yuhui Zhang; Alejandro Lozano; Min Woo Sun; Emma Lundberg; Serena Yeung-Levy

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy

Abstract

Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers – this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL retrieval baselines; we also perform further quantitative analysis and observe interesting agent behaviors like planning, reasoning, and self-verification. Our corpus, datasets, and benchmarks are usable with the popular Search-R1 codebase for RLVR training; they are available on Hugging Face. Finally, our data creation methods are scalable and easily extendable to other scientific domains.

Anthology ID:: 2026.eacl-long.88
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1979–1997
Language:
URL:: https://aclanthology.org/2026.eacl-long.88/
DOI:
Bibkey:
Cite (ACL):: James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, and Serena Yeung-Levy. 2026. PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1979–1997, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR (Burgess et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-long.88.pdf
Checklist:: 2026.eacl-long.88.checklist.pdf

PDF Cite Search Checklist Fix data