Benchmark for Evaluation of Danish Clinical Word Embeddings

Martin Sundahl Laursen, Jannik Skyttegaard Pedersen, Pernille Just Vinholt, Rasmus Søgaard Hansen, Thiusius Rajeeth Savarimuthu


Abstract
In natural language processing, benchmarks are used to track progress and identify useful models. Currently, no benchmark for Danish clinical word embeddings exists. This paper describes the development of a Danish benchmark for clinical word embeddings. The clinical benchmark consists of ten datasets: eight intrinsic and two extrinsic. Moreover, we evaluate word embeddings trained on text from the clinical domain, general practitioner domain and general domain on the established benchmark. All the intrinsic tasks of the benchmark are publicly available.
Anthology ID:
2023.nejlt-1.4
Volume:
Northern European Journal of Language Technology, Volume 9
Month:
Year:
2023
Address:
Linköping, Sweden
Editor:
Leon Derczynski
Venue:
NEJLT
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
Language:
URL:
https://aclanthology.org/2023.nejlt-1.4
DOI:
https://doi.org/10.3384/nejlt.2000-1533.2023.4132
Bibkey:
Cite (ACL):
Martin Sundahl Laursen, Jannik Skyttegaard Pedersen, Pernille Just Vinholt, Rasmus Søgaard Hansen, and Thiusius Rajeeth Savarimuthu. 2023. Benchmark for Evaluation of Danish Clinical Word Embeddings. In Northern European Journal of Language Technology, Volume 9, Linköping, Sweden. Linköping University Electronic Press.
Cite (Informal):
Benchmark for Evaluation of Danish Clinical Word Embeddings (Laursen et al., NEJLT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nejlt-1.4.pdf