HB Deid - HB De-identification tool demonstrator

Hanna Berg, Hercules Dalianis


Abstract
This paper describes a freely available web-based demonstrator called HB Deid. HB Deid identifies so-called protected health information, PHI, in a text written in Swedish and removes, masks, or replaces them with surrogates or pseudonyms. PHIs are named entities such as personal names, locations, ages, phone numbers, dates. HB Deid uses a CRF model trained on non-sensitive annotated text in Swedish, as well as a rule-based post-processing step for finding PHI. The final step in obscuring the PHI is then to either mask it, show only the class name or use a rule-based pseudonymisation system to replace it.
Anthology ID:
2021.nodalida-main.54
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
467–471
Language:
URL:
https://aclanthology.org/2021.nodalida-main.54
DOI:
Bibkey:
Cite (ACL):
Hanna Berg and Hercules Dalianis. 2021. HB Deid - HB De-identification tool demonstrator. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 467–471, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
HB Deid - HB De-identification tool demonstrator (Berg & Dalianis, NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.54.pdf