A Semi-supervised Approach for De-identification of Swedish Clinical Text

Hanna Berg, Hercules Dalianis


Abstract
An abundance of electronic health records (EHR) is produced every day within healthcare. The records possess valuable information for research and future improvement of healthcare. Multiple efforts have been done to protect the integrity of patients while making electronic health records usable for research by removing personally identifiable information in patient records. Supervised machine learning approaches for de-identification of EHRs need annotated data for training, annotations that are costly in time and human resources. The annotation costs for clinical text is even more costly as the process must be carried out in a protected environment with a limited number of annotators who must have signed confidentiality agreements. In this paper is therefore, a semi-supervised method proposed, for automatically creating high-quality training data. The study shows that the method can be used to improve recall from 84.75% to 89.20% without sacrificing precision to the same extent, dropping from 95.73% to 94.20%. The model’s recall is arguably more important for de-identification than precision.
Anthology ID:
2020.lrec-1.547
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4444–4450
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.547
DOI:
Bibkey:
Cite (ACL):
Hanna Berg and Hercules Dalianis. 2020. A Semi-supervised Approach for De-identification of Swedish Clinical Text. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4444–4450, Marseille, France. European Language Resources Association.
Cite (Informal):
A Semi-supervised Approach for De-identification of Swedish Clinical Text (Berg & Dalianis, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.547.pdf