emrKBQA: A Clinical Knowledge-Base Question Answering Dataset

Preethi Raghavan, Jennifer J Liang, Diwakar Mahajan, Rachita Chandra, Peter Szolovits


Abstract
We present emrKBQA, a dataset for answering physician questions from a structured patient record. It consists of questions, logical forms and answers. The questions and logical forms are generated based on real-world physician questions and are slot-filled and answered from patients in the MIMIC-III KB through a semi-automated process. This community-shared release consists of over 940000 question, logical form and answer triplets with 389 types of questions and ~7.5 paraphrases per question type. We perform experiments to validate the quality of the dataset and set benchmarks for question to logical form learning that helps answer questions on this dataset.
Anthology ID:
2021.bionlp-1.7
Volume:
Proceedings of the 20th Workshop on Biomedical Language Processing
Month:
June
Year:
2021
Address:
Online
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–73
Language:
URL:
https://aclanthology.org/2021.bionlp-1.7
DOI:
10.18653/v1/2021.bionlp-1.7
Bibkey:
Cite (ACL):
Preethi Raghavan, Jennifer J Liang, Diwakar Mahajan, Rachita Chandra, and Peter Szolovits. 2021. emrKBQA: A Clinical Knowledge-Base Question Answering Dataset. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 64–73, Online. Association for Computational Linguistics.
Cite (Informal):
emrKBQA: A Clinical Knowledge-Base Question Answering Dataset (Raghavan et al., BioNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.bionlp-1.7.pdf
Code
 emrqa/emrkbqa
Data
emrQA