Xiaojing Yu


2021

pdf bib
Expanding, Retrieving and Infilling: Diversifying Cross-Domain Question Generation with Flexible Templates
Xiaojing Yu | Anxiao Jiang
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Sequence-to-sequence based models have recently shown promising results in generating high-quality questions. However, these models are also known to have main drawbacks such as lack of diversity and bad sentence structures. In this paper, we focus on question generation over SQL database and propose a novel framework by expanding, retrieving, and infilling that first incorporates flexible templates with a neural-based model to generate diverse expressions of questions with sentence structure guidance. Furthermore, a new activation/deactivation mechanism is proposed for template-based sequence-to-sequence generation, which learns to discriminate template patterns and content patterns, thus further improves generation quality. We conduct experiments on two large-scale cross-domain datasets. The experiments show that the superiority of our question generation method in producing more diverse questions while maintaining high quality and consistency under both automatic evaluation and human evaluation.

2020

pdf bib
Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing
Xiaojing Yu | Tianlong Chen | Zhengjie Yu | Huiyu Li | Yang Yang | Xiaoqian Jiang | Anxiao Jiang
Proceedings of the Twelfth Language Resources and Evaluation Conference

Clinical trials often require that patients meet eligibility criteria (e.g., have specific conditions) to ensure the safety and the effectiveness of studies. However, retrieving eligible patients for a trial from the electronic health record (EHR) database remains a challenging task for clinicians since it requires not only medical knowledge about eligibility criteria, but also an adequate understanding of structured query language (SQL). In this paper, we introduce a new dataset that includes the first-of-its-kind eligibility-criteria corpus and the corresponding queries for criteria-to-sql (Criteria2SQL), a task translating the eligibility criteria to executable SQL queries. Compared to existing datasets, the queries in the dataset here are derived from the eligibility criteria of clinical trials and include Order-sensitive, Counting-based, and Boolean-type cases which are not seen before. In addition to the dataset, we propose a novel neural semantic parser as a strong baseline model. Extensive experiments show that the proposed parser outperforms existing state-of-the-art general-purpose text-to-sql models while highlighting the challenges presented by the new dataset. The uniqueness and the diversity of the dataset leave a lot of research opportunities for future improvement.