Kunal Patel


2023

pdf bib
Hospital Discharge Summarization Data Provenance
Paul Landes | Aaron Chaise | Kunal Patel | Sean Huang | Barbara Di Eugenio
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Summarization of medical notes has been studied for decades with hospital discharge summaries garnering recent interest in the research community. While methods for summarizing these notes have been the focus, there has been little work in understanding the feasibility of this task. We believe this effort is warranted given the notes’ length and complexity, and that they are often riddled with poorly formatted structured data and redundancy in copy and pasted text. In this work, we investigate the feasibility of the summarization task by finding the origin, or data provenance, of the discharge summary’s source text. As a motivation to understanding the data challenges of the summarization task, we present DSProv, a new dataset of 51 hospital admissions annotated by clinical informatics physicians. The dataset is analyzed for semantics and the extent of copied text from human authored electronic health record (EHR) notes. We also present a novel unsupervised method of matching notes used in discharge summaries, and release our annotation dataset1 and source code to the community.

2022

pdf bib
A New Public Corpus for Clinical Section Identification: MedSecId
Paul Landes | Kunal Patel | Sean S. Huang | Adam Webb | Barbara Di Eugenio | Cornelia Caragea
Proceedings of the 29th International Conference on Computational Linguistics

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.