Sravanthi Parasa


pdf bib
Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
Tom Hope | Aida Amini | David Wadden | Madeleine van Zuylen | Sravanthi Parasa | Eric Horvitz | Daniel Weld | Roy Schwartz | Hannaneh Hajishirzi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The COVID-19 pandemic has spawned a diverse body of scientific literature that is challenging to navigate, stimulating interest in automated tools to help find useful knowledge. We pursue the construction of a knowledge base (KB) of mechanisms—a fundamental concept across the sciences, which encompasses activities, functions and causal relations, ranging from cellular processes to economic impacts. We extract this information from the natural language of scientific papers by developing a broad, unified schema that strikes a balance between relevance and breadth. We annotate a dataset of mechanisms with our schema and train a model to extract mechanism relations from papers. Our experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature, outperforming the prominent PubMed search in a study with clinical experts. Our search engine, dataset and code are publicly available.


pdf bib
MedICaT: A Dataset of Medical Images, Captions, and Textual References
Sanjay Subramanian | Lucy Lu Wang | Ben Bogin | Sachin Mehta | Madeleine van Zuylen | Sravanthi Parasa | Sameer Singh | Matt Gardner | Hannaneh Hajishirzi
Findings of the Association for Computational Linguistics: EMNLP 2020

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of figures in our dataset), with detailed text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate to the text. To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context. MedICaT consists of 217K images from 131K open access biomedical papers, and includes captions, inline references for 74% of figures, and manually annotated subfigures and subcaptions for a subset of figures. Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures and demonstrate the utility of inline references in image-text matching. Our data and code can be accessed at