Generation and Evaluation of Synthetic Endoscopy Free-Text Reports with Differential Privacy

Agathe Zecevic, Xinyue Zhang, Sebastian Zeki, Angus Roberts


Abstract
The development of NLP models in the healthcare sector faces important challenges due to the limited availability of patient data, mainly driven by privacy concerns. This study proposes the generation of synthetic free-text medical reports, specifically focusing on the gastroenterology domain, to address the scarcity of specialised datasets, while preserving patient privacy. We fine-tune BioGPT on over 90 000 endoscopy reports and integrate Differential Privacy (DP) into the training process. 10 000 DP-private synthetic reports are generated by this model. The generated synthetic data is evaluated through multiple dimensions: similarity to real datasets, language quality, and utility in both supervised and semi-supervised NLP tasks. Results suggest that while DP integration impacts text quality, it offers a promising balance between data utility and privacy, improving the performance of a real-world downstream task. Our study underscores the potential of synthetic data to facilitate model development in the healthcare domain without compromising patient privacy.
Anthology ID:
2024.bionlp-1.2
Volume:
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
14–24
Language:
URL:
https://aclanthology.org/2024.bionlp-1.2
DOI:
10.18653/v1/2024.bionlp-1.2
Bibkey:
Cite (ACL):
Agathe Zecevic, Xinyue Zhang, Sebastian Zeki, and Angus Roberts. 2024. Generation and Evaluation of Synthetic Endoscopy Free-Text Reports with Differential Privacy. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 14–24, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Generation and Evaluation of Synthetic Endoscopy Free-Text Reports with Differential Privacy (Zecevic et al., BioNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.bionlp-1.2.pdf