Most attempts on Text-to-SQL task using encoder-decoder approach show a big problem of dramatic decline in performance for new databases. For the popular Spider dataset, despite models achieving 70% accuracy on its development or test sets, the same models show a huge decline below 20% accuracy for unseen databases. The root causes for this problem are complex and they cannot be easily fixed by adding more manually created training. In this paper we address the problem and propose a solution that is a hybrid system using automated training-data augmentation technique. Our system consists of a rule-based and a deep learning components that interact to understand crucial information in a given query and produce correct SQL as a result. It achieves double-digit percentage improvement for databases that are not part of the Spider corpus.
Business Process Management (BPM) is the discipline which is responsible for management of discovering, analyzing, redesigning, monitoring, and controlling business processes. One of the most crucial tasks of BPM is discovering and modelling business processes from text documents. In this paper, we present our system that resolves an end-to-end problem consisting of 1) recognizing conditional sentences from technical documents, 2) finding boundaries to extract conditional and resultant clauses from each conditional sentence, and 3) categorizing resultant clause as Action or Consequence which later helps to generate new steps in our business process model automatically. We created a new dataset and three models to solve this problem. Our best model achieved very promising results of 83.82, 87.84, and 85.75 for Precision, Recall, and F1, respectively, for extracting Condition, Action, and Consequence clauses using Exact Match metric.
Motion recognition is one of the basic cognitive capabilities of many life forms, however, detecting and understanding motion in text is not a trivial task. In addition, identifying motion entities in natural language is not only challenging but also beneficial for a better natural language understanding. In this paper, we present a Motion Entity Tagging (MET) model to identify entities in motion in a text using the Literal-Motion-in-Text (LiMiT) dataset for training and evaluating the model. Then we propose a new method to split clauses and phrases from complex and long motion sentences to improve the performance of our MET model. We also present results showing that motion features, in particular, entity in motion benefits the Named-Entity Recognition (NER) task. Finally, we present an analysis for the special co-occurrence relation between the person category in NER and animate entities in motion, which significantly improves the classification performance for the person category in NER.
Motion recognition is one of the basic cognitive capabilities of many life forms, yet identifying motion of physical entities in natural language have not been explored extensively and empirically. We present the Literal-Motion-in-Text (LiMiT) dataset, a large human-annotated collection of English text sentences describing physical occurrence of motion, with annotated physical entities in motion. We describe the annotation process for the dataset, analyze its scale and diversity, and report results of several baseline models. We also present future research directions and applications of the LiMiT dataset and share it publicly as a new resource for the research community.