Paragraph Retrieval for Enhanced Question Answering in Clinical Documents

Vojtech Lanz; Pavel Pecina

doi:10.18653/v1/2024.bionlp-1.48

Paragraph Retrieval for Enhanced Question Answering in Clinical Documents

Abstract

Healthcare professionals often manually extract information from large clinical documents to address patient-related questions. The use of Natural Language Processing (NLP) techniques, particularly Question Answering (QA) models, is a promising direction for improving the efficiency of this process. However, document-level QA from large documents is often impractical or even infeasible (for model training and inference). In this work, we solve the document-level QA from clinical reports in a two-step approach: first, the entire report is split into segments and for a given question the most relevant segment is predicted by a NLP model; second, a QA model is applied to the question and the retrieved segment as context. We investigate the effectiveness of heading-based and naive paragraph segmentation approaches for various paragraph lengths on two subsets of the emrQA dataset. Our experiments reveal that an average paragraph length used as a parameter for the segmentation has no significant effect on performance during the whole document-level QA process. That means experiments focusing on segmentation into shorter paragraphs perform similarly to those focusing on entire unsegmented reports. Surprisingly, naive uniform segmentation is sufficient even though it is not based on prior knowledge of the clinical document’s characteristics.

Anthology ID:: 2024.bionlp-1.48
Volume:: Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 580–590
Language:
URL:: https://aclanthology.org/2024.bionlp-1.48/
DOI:: 10.18653/v1/2024.bionlp-1.48
Bibkey:
Cite (ACL):: Vojtech Lanz and Pavel Pecina. 2024. Paragraph Retrieval for Enhanced Question Answering in Clinical Documents. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 580–590, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Paragraph Retrieval for Enhanced Question Answering in Clinical Documents (Lanz & Pecina, BioNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.bionlp-1.48.pdf

PDF Cite Search Fix data