Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

Oren Melamud, Chaitanya Shivade


Abstract
Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.
Anthology ID:
W19-1905
Volume:
Proceedings of the 2nd Clinical Natural Language Processing Workshop
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota, USA
Editors:
Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann
Venue:
ClinicalNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–45
Language:
URL:
https://aclanthology.org/W19-1905
DOI:
10.18653/v1/W19-1905
Bibkey:
Cite (ACL):
Oren Melamud and Chaitanya Shivade. 2019. Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, pages 35–45, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
Cite (Informal):
Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models (Melamud & Shivade, ClinicalNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-1905.pdf
Code
 orenmel/synth-clinical-notes
Data
MIMIC-IIIWikiText-103WikiText-2