The USMLE® Step 2 Clinical Skills Patient Note Corpus

Victoria Yaneva, Janet Mee, Le Ha, Polina Harik, Michael Jodoin, Alex Mechaber


Abstract
This paper presents a corpus of 43,985 clinical patient notes (PNs) written by 35,156 examinees during the high-stakes USMLE® Step 2 Clinical Skills examination. In this exam, examinees interact with standardized patients - people trained to portray simulated scenarios called clinical cases. For each encounter, an examinee writes a PN, which is then scored by physician raters using a rubric of clinical concepts, expressions of which should be present in the PN. The corpus features PNs from 10 clinical cases, as well as the clinical concepts from the case rubrics. A subset of 2,840 PNs were annotated by 10 physician experts such that all 143 concepts from the case rubrics (e.g., shortness of breath) were mapped to 34,660 PN phrases (e.g., dyspnea, difficulty breathing). The corpus is available via a data sharing agreement with NBME and can be requested at https://www.nbme.org/services/data-sharing.
Anthology ID:
2022.naacl-main.208
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2880–2886
Language:
URL:
https://aclanthology.org/2022.naacl-main.208
DOI:
10.18653/v1/2022.naacl-main.208
Bibkey:
Cite (ACL):
Victoria Yaneva, Janet Mee, Le Ha, Polina Harik, Michael Jodoin, and Alex Mechaber. 2022. The USMLE® Step 2 Clinical Skills Patient Note Corpus. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2880–2886, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
The USMLE® Step 2 Clinical Skills Patient Note Corpus (Yaneva et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.208.pdf