Rachita Chandra
2021
emrKBQA: A Clinical Knowledge-Base Question Answering Dataset
Preethi Raghavan
|
Jennifer J Liang
|
Diwakar Mahajan
|
Rachita Chandra
|
Peter Szolovits
Proceedings of the 20th Workshop on Biomedical Language Processing
We present emrKBQA, a dataset for answering physician questions from a structured patient record. It consists of questions, logical forms and answers. The questions and logical forms are generated based on real-world physician questions and are slot-filled and answered from patients in the MIMIC-III KB through a semi-automated process. This community-shared release consists of over 940000 question, logical form and answer triplets with 389 types of questions and ~7.5 paraphrases per question type. We perform experiments to validate the quality of the dataset and set benchmarks for question to logical form learning that helps answer questions on this dataset.