ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling

Sangryul Kim; Donghee Han; Sehyun Kim

doi:10.18653/v1/2024.clinicalnlp-1.65

ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling

Abstract

Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database.We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.

Anthology ID:: 2024.clinicalnlp-1.65
Volume:: Proceedings of the 6th Clinical Natural Language Processing Workshop
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Tristan Naumann, Asma Ben Abacha, Steven Bethard, Kirk Roberts, Danielle Bitterman
Venues:: ClinicalNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 687–696
Language:
URL:: https://aclanthology.org/2024.clinicalnlp-1.65
DOI:: 10.18653/v1/2024.clinicalnlp-1.65
Bibkey:
Cite (ACL):: Sangryul Kim, Donghee Han, and Sehyun Kim. 2024. ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling. In Proceedings of the 6th Clinical Natural Language Processing Workshop, pages 687–696, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling (Kim et al., ClinicalNLP-WS 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.clinicalnlp-1.65.pdf

PDF Cite Search