emrKBQA: A Clinical Knowledge-Base Question Answering Dataset
Preethi Raghavan | Jennifer J Liang | Diwakar Mahajan | Rachita Chandra | Peter Szolovits
Proceedings of the 20th Workshop on Biomedical Language Processing
We present emrKBQA, a dataset for answering physician questions from a structured patient record. It consists of questions, logical forms and answers. The questions and logical forms are generated based on real-world physician questions and are slot-filled and answered from patients in the MIMIC-III KB through a semi-automated process. This community-shared release consists of over 940000 question, logical form and answer triplets with 389 types of questions and ~7.5 paraphrases per question type. We perform experiments to validate the quality of the dataset and set benchmarks for question to logical form learning that helps answer questions on this dataset.