MeDa-BERT: A medical Danish pretrained transformer model

Jannik Pedersen, Martin Laursen, Pernille Vinholt, Thiusius Rajeeth Savarimuthu


Abstract
This paper introduces a medical Danish BERT-based language model (MeDa-BERT) and medical Danish word embeddings. The word embeddings and MeDa-BERT were pretrained on a new medical Danish corpus consisting of 133M tokens from medical Danish books and text from the internet. The models showed improved performance over general-domain models on medical Danish classification tasks. The medical word embeddings and MeDa-BERT are publicly available.
Anthology ID:
2023.nodalida-1.31
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
301–307
Language:
URL:
https://aclanthology.org/2023.nodalida-1.31
DOI:
Bibkey:
Cite (ACL):
Jannik Pedersen, Martin Laursen, Pernille Vinholt, and Thiusius Rajeeth Savarimuthu. 2023. MeDa-BERT: A medical Danish pretrained transformer model. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 301–307, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
MeDa-BERT: A medical Danish pretrained transformer model (Pedersen et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.31.pdf