RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports

Sarvesh Soni, Meghana Gudala, Atieh Pajouhi, Kirk Roberts


Abstract
We present a radiology question answering dataset, RadQA, with 3074 questions posed against radiology reports and annotated with their corresponding answer spans (resulting in a total of 6148 question-answer evidence pairs) by physicians. The questions are manually created using the clinical referral section of the reports that take into account the actual information needs of ordering physicians and eliminate bias from seeing the answer context (and, further, organically create unanswerable questions). The answer spans are marked within the Findings and Impressions sections of a report. The dataset aims to satisfy the complex clinical requirements by including complete (yet concise) answer phrases (which are not just entities) that can span multiple lines. We conduct a thorough analysis of the proposed dataset by examining the broad categories of disagreement in annotation (providing insights on the errors made by humans) and the reasoning requirements to answer a question (uncovering the huge dependence on medical knowledge for answering the questions). The advanced transformer language models achieve the best F1 score of 63.55 on the test set, however, the best human performance is 90.31 (with an average of 84.52). This demonstrates the challenging nature of RadQA that leaves ample scope for future method research.
Anthology ID:
2022.lrec-1.672
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6250–6259
Language:
URL:
https://aclanthology.org/2022.lrec-1.672
DOI:
Bibkey:
Cite (ACL):
Sarvesh Soni, Meghana Gudala, Atieh Pajouhi, and Kirk Roberts. 2022. RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6250–6259, Marseille, France. European Language Resources Association.
Cite (Informal):
RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports (Soni et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.672.pdf
Data
RadQAMIMIC-IIISQuADemrQA