MedQA-SWE - a Clinical Question & Answer Dataset for Swedish

Niclas Hertzberg, Anna Lokrantz


Abstract
Considering the rapid improvement of large generative language models, it is important to measure their ability to encode clinical domain knowledge in order to help determine their potential utility in a clinical setting. To this end we present MedQA-SWE – a novel multiple choice, clinical question & answering (Q&A) dataset in Swedish consisting of 3,180 questions. The dataset was created from a series of exams aimed at evaluating doctors’ clinical understanding and decision making and is the first open-source clinical Q&A dataset in Swedish. The exams – originally in PDF format – were parsed and each question manually checked and curated in order to limit errors in the dataset. We provide dataset statistics along with benchmark accuracy scores of seven large generative language models on a representative sample of questions in a zero-shot setting, with some models showing impressive performance given the difficulty of the exam the dataset is based on.
Anthology ID:
2024.lrec-main.975
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
11178–11186
Language:
URL:
https://aclanthology.org/2024.lrec-main.975
DOI:
Bibkey:
Cite (ACL):
Niclas Hertzberg and Anna Lokrantz. 2024. MedQA-SWE - a Clinical Question & Answer Dataset for Swedish. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11178–11186, Torino, Italia. ELRA and ICCL.
Cite (Informal):
MedQA-SWE - a Clinical Question & Answer Dataset for Swedish (Hertzberg & Lokrantz, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.975.pdf