Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types

Siting Liang; Mareike Hartmann; Daniel Sonntag

doi:10.18653/v1/2023.clinicalnlp-1.31

Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types

Siting Liang, Mareike Hartmann, Daniel Sonntag

Abstract

Information extraction from clinical text has the potential to facilitate clinical research and personalized clinical care, but annotating large amounts of data for each set of target tasks is prohibitive. We present a German medical Named Entity Recognition (NER) system capable of cross-domain knowledge transferring. The system builds on a pre-trained German language model and a token-level binary classifier, employing semantic types sourced from the Unified Medical Language System (UMLS) as entity labels to identify corresponding entity spans within the input text. To enhance the system’s performance and robustness, we pre-train it using a medical literature corpus that incorporates UMLS semantic term annotations. We evaluate the system’s effectiveness on two German annotated datasets obtained from different clinics in zero- and few-shot settings. The results show that our approach outperforms task-specific Condition Random Fields (CRF) classifiers in terms of accuracy. Our work contributes to developing robust and transparent German medical NER models that can support the extraction of information from various clinical texts.

Anthology ID:: 2023.clinicalnlp-1.31
Volume:: Proceedings of the 5th Clinical Natural Language Processing Workshop
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Anna Rumshisky
Venue:: ClinicalNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 259–271
Language:
URL:: https://aclanthology.org/2023.clinicalnlp-1.31/
DOI:: 10.18653/v1/2023.clinicalnlp-1.31
Bibkey:
Cite (ACL):: Siting Liang, Mareike Hartmann, and Daniel Sonntag. 2023. Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types. In Proceedings of the 5th Clinical Natural Language Processing Workshop, pages 259–271, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Cross-domain German Medical Named Entity Recognition using a Pre-Trained Language Model and Unified Medical Semantic Types (Liang et al., ClinicalNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.clinicalnlp-1.31.pdf

PDF Cite Search Fix data