Jenna Sparks

2024

Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical categories and every node is linked to the studies assigned to that category. Our naive LLM-based pipeline for hierarchy generation from a set of studies produces promising yet imperfect hierarchies, motivating us to collect CHIME, an expert-curated dataset for this task focused on biomedicine. Given the challenging and time-consuming nature of building hierarchies from scratch, we use a human-in-the-loop process in which experts correct errors (both links between categories and study assignment) in LLM-generated hierarchies. CHIME contains 2,174 LLM-generated hierarchies covering 472 topics, and expert-corrected hierarchies for a subset of 100 topics. Expert corrections allow us to quantify LLM performance, and we find that while they are quite good at generating and organizing categories, their assignment of studies to categories could be improved. We attempt to train a corrector model with human feedback which improves study assignment by 12.6 F1 points. We release our dataset and models to encourage research on developing better assistive tools for literature review.

To appropriately interpret and use scientific claims for sensemaking and decision-making, it is critical to contextualize them, not just with textual evidence that the claim was in fact asserted, but also with key supporting empirical evidence, such as a figure that describes a key result, and methodological details, such as the methods of data collection. Retrieving this contextual information when encountering claims in isolation, away from their source papers, is difficult and time-consuming for humans. Scholarly document processing models could help to contextualize scientific claims, but there is a lack of datasets designed for this task. Thus, we contribute a dataset of 585 scientific claims with gold annotations for supporting figures and tables, and gold text snippets of methodological details, that ground the key results behind each claim and run the Context24 shared task to encourage model development for this task. This report describes details of our dataset construction process, summarizes results from the shared task conducted at the 4th Workshop on Scholarly Document Processing (SDP), and discusses future research directions in this space. To support further research, we also publicly release the dataset on HuggingFace.

Co-authors

Venues

Fix data