Kunal Patel
2023
Hospital Discharge Summarization Data Provenance
Paul Landes
|
Aaron Chaise
|
Kunal Patel
|
Sean Huang
|
Barbara Di Eugenio
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Summarization of medical notes has been studied for decades with hospital discharge summaries garnering recent interest in the research community. While methods for summarizing these notes have been the focus, there has been little work in understanding the feasibility of this task. We believe this effort is warranted given the notes’ length and complexity, and that they are often riddled with poorly formatted structured data and redundancy in copy and pasted text. In this work, we investigate the feasibility of the summarization task by finding the origin, or data provenance, of the discharge summary’s source text. As a motivation to understanding the data challenges of the summarization task, we present DSProv, a new dataset of 51 hospital admissions annotated by clinical informatics physicians. The dataset is analyzed for semantics and the extent of copied text from human authored electronic health record (EHR) notes. We also present a novel unsupervised method of matching notes used in discharge summaries, and release our annotation dataset1 and source code to the community.
2022
A New Public Corpus for Clinical Section Identification: MedSecId
Paul Landes
|
Kunal Patel
|
Sean S. Huang
|
Adam Webb
|
Barbara Di Eugenio
|
Cornelia Caragea
Proceedings of the 29th International Conference on Computational Linguistics
The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.
Search
Fix data
Co-authors
- Barbara Di Eugenio 2
- Paul Landes 2
- Cornelia Caragea 1
- Aaron Chaise 1
- Sean S. Huang 1
- show all...