RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports
Sarvesh Soni | Meghana Gudala | Atieh Pajouhi | Kirk Roberts
Proceedings of the Thirteenth Language Resources and Evaluation Conference
We present a radiology question answering dataset, RadQA, with 3074 questions posed against radiology reports and annotated with their corresponding answer spans (resulting in a total of 6148 question-answer evidence pairs) by physicians. The questions are manually created using the clinical referral section of the reports that take into account the actual information needs of ordering physicians and eliminate bias from seeing the answer context (and, further, organically create unanswerable questions). The answer spans are marked within the Findings and Impressions sections of a report. The dataset aims to satisfy the complex clinical requirements by including complete (yet concise) answer phrases (which are not just entities) that can span multiple lines. We conduct a thorough analysis of the proposed dataset by examining the broad categories of disagreement in annotation (providing insights on the errors made by humans) and the reasoning requirements to answer a question (uncovering the huge dependence on medical knowledge for answering the questions). The advanced transformer language models achieve the best F1 score of 63.55 on the test set, however, the best human performance is 90.31 (with an average of 84.52). This demonstrates the challenging nature of RadQA that leaves ample scope for future method research.
Extracting Adherence Information from Electronic Health Records
Jordan Sanders | Meghana Gudala | Kathleen Hamilton | Nishtha Prasad | Jordan Stovall | Eduardo Blanco | Jane E Hamilton | Kirk Roberts
Proceedings of the 28th International Conference on Computational Linguistics
Patient adherence is a critical factor in health outcomes. We present a framework to extract adherence information from electronic health records, including both sentence-level information indicating general adherence information (full, partial, none, etc.) and span-level information providing additional information such as adherence type (medication or nonmedication), reasons and outcomes. We annotate and make publicly available a new corpus of 3,000 de-identified sentences, and discuss the language physicians use to document adherence information. We also explore models based on state-of-the-art transformers to automate both tasks.
Extraction of Lactation Frames from Drug Labels and LactMed
Heath Goodrum | Meghana Gudala | Ankita Misra | Kirk Roberts
Proceedings of the 18th BioNLP Workshop and Shared Task
This paper describes a natural language processing (NLP) approach to extracting lactation-specific drug information from two sources: FDA-mandated drug labels and the NLM Drugs and Lactation Database (LactMed). A frame semantic approach is utilized, and the paper describes the selected frames, their annotation on a set of 900 sections from drug labels and LactMed articles, and the NLP system to extract such frame instances automatically. The ultimate goal of the project is to use such a system to identify discrepancies in lactation-related drug information between these resources.
- Kirk Roberts 3
- Heath Goodrum 1
- Ankita Misra 1
- Jordan Sanders 1
- Kathleen Hamilton 1
- show all...