Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing

Xiaojing Yu, Tianlong Chen, Zhengjie Yu, Huiyu Li, Yang Yang, Xiaoqian Jiang, Anxiao Jiang


Abstract
Clinical trials often require that patients meet eligibility criteria (e.g., have specific conditions) to ensure the safety and the effectiveness of studies. However, retrieving eligible patients for a trial from the electronic health record (EHR) database remains a challenging task for clinicians since it requires not only medical knowledge about eligibility criteria, but also an adequate understanding of structured query language (SQL). In this paper, we introduce a new dataset that includes the first-of-its-kind eligibility-criteria corpus and the corresponding queries for criteria-to-sql (Criteria2SQL), a task translating the eligibility criteria to executable SQL queries. Compared to existing datasets, the queries in the dataset here are derived from the eligibility criteria of clinical trials and include Order-sensitive, Counting-based, and Boolean-type cases which are not seen before. In addition to the dataset, we propose a novel neural semantic parser as a strong baseline model. Extensive experiments show that the proposed parser outperforms existing state-of-the-art general-purpose text-to-sql models while highlighting the challenges presented by the new dataset. The uniqueness and the diversity of the dataset leave a lot of research opportunities for future improvement.
Anthology ID:
2020.lrec-1.714
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5829–5837
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.714
DOI:
Bibkey:
Cite (ACL):
Xiaojing Yu, Tianlong Chen, Zhengjie Yu, Huiyu Li, Yang Yang, Xiaoqian Jiang, and Anxiao Jiang. 2020. Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5829–5837, Marseille, France. European Language Resources Association.
Cite (Informal):
Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing (Yu et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.714.pdf