Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French

Nicolas Hiebel, Olivier Ferret, Karen Fort, Aurélie Névéol


Abstract
In sensitive domains, the sharing of corpora is restricted due to confidentiality, copyrights or trade secrets. Automatic text generation can help alleviate these issues by producing synthetic texts that mimic the linguistic properties of real documents while preserving confidentiality. In this study, we assess the usability of synthetic corpus as a substitute training corpus for clinical information extraction. Our goal is to automatically produce a clinical case corpus annotated with clinical entities and to evaluate it for a named entity recognition (NER) task. We use two auto-regressive neural models partially or fully trained on generic French texts and fine-tuned on clinical cases to produce a corpus of synthetic clinical cases. We study variants of the generation process: (i) fine-tuning on annotated vs. plain text (in that case, annotations are obtained a posteriori) and (ii) selection of generated texts based on models parameters and filtering criteria. We then train NER models with the resulting synthetic text and evaluate them on a gold standard clinical corpus. Our experiments suggest that synthetic text is useful for clinical NER.
Anthology ID:
2023.eacl-main.170
Original:
2023.eacl-main.170v1
Version 2:
2023.eacl-main.170v2
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2320–2338
Language:
URL:
https://aclanthology.org/2023.eacl-main.170
DOI:
10.18653/v1/2023.eacl-main.170
Bibkey:
Cite (ACL):
Nicolas Hiebel, Olivier Ferret, Karen Fort, and Aurélie Névéol. 2023. Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2320–2338, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French (Hiebel et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.170.pdf
Video:
 https://aclanthology.org/2023.eacl-main.170.mp4