Adam G. Dunn

2025

Strategies for Efficient Retrieval-augmented Generation in Clinical Domains with RAPTOR: A Benchmarking Study
Xumou Zhang | Qixuan Hu | Jinman Kim | Adam G. Dunn
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

The Recursive Abstractive Processing for Tree-Organized Retrieval (RAPTOR) framework deploys a hierarchical tree-structured datastore to integrate local and global context, enabling efficient handling of long documents for language models. This design is especially useful when cloud-based language models are unavailable or undesirable. For instance, with offline confidential patient records or stringent data-privacy requirements. We benchmarked RAPTOR on the QuALITY dataset and a novel Clinical Trial question-answering dataset (CTQA) drawn from over 500 000 registry entries. Experiments varied question complexity (simple vs. complex), four language models, four embedding models, and three chunking strategies. Also incorporated GPT-4o as a cloud-based baseline. Results show that, with optimal settings, RAPTOR combined with smaller local models outperforms GPT-4o on complex CTQA questions, although this gain does not extend to QuALITY. These outcomes highlight RAPTOR’s promise as a practical, locally implementable solution for long-context understanding.

2021

pdf bib abs

The rapid growth in published clinical trials makes it difficult to maintain up-to-date systematic reviews, which require finding all relevant trials. This leads to policy and practice decisions based on out-of-date, incomplete, and biased subsets of available clinical evidence. Extracting and then normalising Population, Intervention, Comparator, and Outcome (PICO) information from clinical trial articles may be an effective way to automatically assign trials to systematic reviews and avoid searching and screening—the two most time-consuming systematic review processes. We propose and test a novel approach to PICO span detection. The major difference between our proposed method and previous approaches comes from detecting spans without needing annotated span data and using only crowdsourced sentence-level annotations. Experiments on two datasets show that PICO span detection results achieve much higher results for recall when compared to fully supervised methods with PICO sentence detection at least as good as human annotations. By removing the reliance on expert annotations for span detection, this work could be used in a human-machine pipeline for turning low-quality, crowdsourced, and sentence-level PICO annotations into structured information that can be used to quickly assign trials to relevant systematic reviews.

Co-authors

Yifang Sun 1

Wei Wang 1

Xumou Zhang 1

Venues

Findings1
RANLP1

Fix author