Vadim Sheinin


2022

pdf bib
Addressing Limitations of Encoder-Decoder Based Approach to Text-to-SQL
Octavian Popescu | Irene Manotas | Ngoc Phuoc An Vo | Hangu Yeo | Elahe Khorashani | Vadim Sheinin
Proceedings of the 29th International Conference on Computational Linguistics

Most attempts on Text-to-SQL task using encoder-decoder approach show a big problem of dramatic decline in performance for new databases. For the popular Spider dataset, despite models achieving 70% accuracy on its development or test sets, the same models show a huge decline below 20% accuracy for unseen databases. The root causes for this problem are complex and they cannot be easily fixed by adding more manually created training. In this paper we address the problem and propose a solution that is a hybrid system using automated training-data augmentation technique. Our system consists of a rule-based and a deep learning components that interact to understand crucial information in a given query and produce correct SQL as a result. It achieves double-digit percentage improvement for databases that are not part of the Spider corpus.

2021

pdf bib
Recognizing and Splitting Conditional Sentences for Automation of Business Processes Management
Ngoc Phuoc An Vo | Irene Manotas | Octavian Popescu | Algimantas Černiauskas | Vadim Sheinin
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Business Process Management (BPM) is the discipline which is responsible for management of discovering, analyzing, redesigning, monitoring, and controlling business processes. One of the most crucial tasks of BPM is discovering and modelling business processes from text documents. In this paper, we present our system that resolves an end-to-end problem consisting of 1) recognizing conditional sentences from technical documents, 2) finding boundaries to extract conditional and resultant clauses from each conditional sentence, and 3) categorizing resultant clause as Action or Consequence which later helps to generate new steps in our business process model automatically. We created a new dataset and three models to solve this problem. Our best model achieved very promising results of 83.82, 87.84, and 85.75 for Precision, Recall, and F1, respectively, for extracting Condition, Action, and Consequence clauses using Exact Match metric.

2020

pdf bib
Identifying Motion Entities in Natural Language and A Case Study for Named Entity Recognition
Ngoc Phuoc An Vo | Irene Manotas | Vadim Sheinin | Octavian Popescu
Proceedings of the 28th International Conference on Computational Linguistics

Motion recognition is one of the basic cognitive capabilities of many life forms, however, detecting and understanding motion in text is not a trivial task. In addition, identifying motion entities in natural language is not only challenging but also beneficial for a better natural language understanding. In this paper, we present a Motion Entity Tagging (MET) model to identify entities in motion in a text using the Literal-Motion-in-Text (LiMiT) dataset for training and evaluating the model. Then we propose a new method to split clauses and phrases from complex and long motion sentences to improve the performance of our MET model. We also present results showing that motion features, in particular, entity in motion benefits the Named-Entity Recognition (NER) task. Finally, we present an analysis for the special co-occurrence relation between the person category in NER and animate entities in motion, which significantly improves the classification performance for the person category in NER.

pdf bib
LiMiT: The Literal Motion in Text Dataset
Irene Manotas | Ngoc Phuoc An Vo | Vadim Sheinin
Findings of the Association for Computational Linguistics: EMNLP 2020

Motion recognition is one of the basic cognitive capabilities of many life forms, yet identifying motion of physical entities in natural language have not been explored extensively and empirically. We present the Literal-Motion-in-Text (LiMiT) dataset, a large human-annotated collection of English text sentences describing physical occurrence of motion, with annotated physical entities in motion. We describe the annotation process for the dataset, analyze its scale and diversity, and report results of several baseline models. We also present future research directions and applications of the LiMiT dataset and share it publicly as a new resource for the research community.

2018

pdf bib
Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model
Kun Xu | Lingfei Wu | Zhiguo Wang | Mo Yu | Liwei Chen | Vadim Sheinin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Existing neural semantic parsers mainly utilize a sequence encoder, i.e., a sequential LSTM, to extract word order features while neglecting other valuable syntactic information such as dependency or constituent trees. In this paper, we first propose to use the syntactic graph to represent three types of syntactic information, i.e., word order, dependency and constituency features; then employ a graph-to-sequence model to encode the syntactic graph and decode a logical form. Experimental results on benchmark datasets show that our model is comparable to the state-of-the-art on Jobs640, ATIS, and Geo880. Experimental results on adversarial examples demonstrate the robustness of the model is also improved by encoding more syntactic information.

pdf bib
SQL-to-Text Generation with Graph-to-Sequence Model
Kun Xu | Lingfei Wu | Zhiguo Wang | Yansong Feng | Vadim Sheinin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Previous work approaches the SQL-to-text generation task using vanilla Seq2Seq models, which may not fully capture the inherent graph-structured information in SQL query. In this paper, we propose a graph-to-sequence model to encode the global structure information into node embeddings. This model can effectively learn the correlation between the SQL query pattern and its interpretation. Experimental results on the WikiSQL dataset and Stackoverflow dataset show that our model outperforms the Seq2Seq and Tree2Seq baselines, achieving the state-of-the-art performance.

pdf bib
A Large Resource of Patterns for Verbal Paraphrases
Octavian Popescu | Ngoc Phuoc An Vo | Vadim Sheinin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
QUEST: A Natural Language Interface to Relational Databases
Vadim Sheinin | Elahe Khorashani | Hangu Yeo | Kun Xu | Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)