Danish Clinical Named Entity Recognition and Relation Extraction

Martin Laursen, Jannik Pedersen, Rasmus Hansen, Thiusius Rajeeth Savarimuthu, Pernille Vinholt


Abstract
Electronic health records contain important information regarding the patients’ medical history but much of this information is stored in unstructured narrative text. This paper presents the first Danish clinical named entity recognition and relation extraction dataset for extraction of six types of clinical events, six types of attributes, and three types of relations. The dataset contains 11,607 paragraphs from Danish electronic health records containing 54,631 clinical events, 41,954 attributes, and 14,604 relations. We detail the methodology of developing the annotation scheme, and train a transformer-based architecture on the developed dataset with macro F1 performance of 60.05%, 44.85%, and 70.64% for clinical events, attributes, and relations, respectively.
Anthology ID:
2023.nodalida-1.65
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
655–666
Language:
URL:
https://aclanthology.org/2023.nodalida-1.65
DOI:
Bibkey:
Cite (ACL):
Martin Laursen, Jannik Pedersen, Rasmus Hansen, Thiusius Rajeeth Savarimuthu, and Pernille Vinholt. 2023. Danish Clinical Named Entity Recognition and Relation Extraction. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 655–666, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Danish Clinical Named Entity Recognition and Relation Extraction (Laursen et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.65.pdf