Hospital Discharge Summarization Data Provenance

Paul Landes, Aaron Chaise, Kunal Patel, Sean Huang, Barbara Di Eugenio


Abstract
Summarization of medical notes has been studied for decades with hospital discharge summaries garnering recent interest in the research community. While methods for summarizing these notes have been the focus, there has been little work in understanding the feasibility of this task. We believe this effort is warranted given the notes’ length and complexity, and that they are often riddled with poorly formatted structured data and redundancy in copy and pasted text. In this work, we investigate the feasibility of the summarization task by finding the origin, or data provenance, of the discharge summary’s source text. As a motivation to understanding the data challenges of the summarization task, we present DSProv, a new dataset of 51 hospital admissions annotated by clinical informatics physicians. The dataset is analyzed for semantics and the extent of copied text from human authored electronic health record (EHR) notes. We also present a novel unsupervised method of matching notes used in discharge summaries, and release our annotation dataset1 and source code to the community.
Anthology ID:
2023.bionlp-1.41
Original:
2023.bionlp-1.41v1
Version 2:
2023.bionlp-1.41v2
Volume:
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
439–448
Language:
URL:
https://aclanthology.org/2023.bionlp-1.41
DOI:
10.18653/v1/2023.bionlp-1.41
Bibkey:
Cite (ACL):
Paul Landes, Aaron Chaise, Kunal Patel, Sean Huang, and Barbara Di Eugenio. 2023. Hospital Discharge Summarization Data Provenance. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 439–448, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Hospital Discharge Summarization Data Provenance (Landes et al., BioNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bionlp-1.41.pdf