Maya Varma


pdf bib
Toward Expanding the Scope of Radiology Report Summarization to Multiple Anatomies and Modalities
Zhihong Chen | Maya Varma | Xiang Wan | Curtis Langlotz | Jean-Benoit Delbrouck
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations. First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric.

pdf bib
Overview of the RadSum23 Shared Task on Multi-modal and Multi-anatomical Radiology Report Summarization
Jean-Benoit Delbrouck | Maya Varma | Pierre Chambon | Curtis Langlotz
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Radiology report summarization is a growing area of research. Given the Findings and/or Background sections of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. Recent efforts have released systems that achieve promising performance as measured by widely used summarization metrics such as BLEU and ROUGE. However, the research area of radiology report summarization currently faces two important limitations. First, most of the results are reported on private datasets. This limitation prevents the ability to reproduce results and fairly compare different systems and solutions. Secondly, to the best of our knowledge, most research is carried out on chest X-rays. To palliate these two limitations, we propose a radiology report summarization (RadSum) challenge on i) a new dataset of eleven different modalities and anatomies pairs based on the MIMIC-III database ii) a multimodal report summarization dataset based on MIMIC-CXR enhanced with a brand-new test-set from Stanford Hospital. In total, we received 112 submissions across 11 teams.


pdf bib
ViLMedic: a framework for research at the intersection of vision and language in medical AI
Jean-benoit Delbrouck | Khaled Saab | Maya Varma | Sabri Eyuboglu | Pierre Chambon | Jared Dunnmon | Juan Zambrano | Akshay Chaudhari | Curtis Langlotz
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

There is a growing need to model interactions between data modalities (e.g., vision, language) — both to improve AI predictions on existing tasks and to enable new applications. In the recent field of multimodal medical AI, integrating multiple modalities has gained widespread popularity as multimodal models have proven to improve performance, robustness, require less training samples and add complementary information. To improve technical reproducibility and transparency for multimodal medical tasks as well as speed up progress across medical AI, we present ViLMedic, a Vision-and-Language medical library. As of 2022, the library contains a dozen reference implementations replicating the state-of-the-art results for problems that range from medical visual question answering and radiology report generation to multimodal representation learning on widely adopted medical datasets. In addition, ViLMedic hosts a model-zoo with more than twenty pretrained models for the above tasks designed to be extensible by researchers but also simple for practitioners. Ultimately, we hope our reproducible pipelines can enable clinical translation and create real impact. The library is available at


pdf bib
Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text
Maya Varma | Laurel Orr | Sen Wu | Megan Leszczynski | Xiao Ling | Christopher Ré
Findings of the Association for Computational Linguistics: EMNLP 2021

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.


pdf bib
Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning
Rachel Gardner | Maya Varma | Clare Zhu | Ranjay Krishna
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid; if so, we identify the answer within the free-form response. We design BERT-based models to perform the QA plausibility task, and we evaluate the ability of our models to generate a clean, usable question-answer dataset. Our highest-performing approach consists of a single-task model which determines the plausibility of the question, followed by a multi-task model which evaluates the plausibility of the response as well as extracts answers (Question Plausibility AUROC=0.75, Response Plausibility AUROC=0.78, Answer Extraction F1=0.665).