Ngoc Phuoc An Vo


2022

pdf bib
Addressing Limitations of Encoder-Decoder Based Approach to Text-to-SQL
Octavian Popescu | Irene Manotas | Ngoc Phuoc An Vo | Hangu Yeo | Elahe Khorashani | Vadim Sheinin
Proceedings of the 29th International Conference on Computational Linguistics

Most attempts on Text-to-SQL task using encoder-decoder approach show a big problem of dramatic decline in performance for new databases. For the popular Spider dataset, despite models achieving 70% accuracy on its development or test sets, the same models show a huge decline below 20% accuracy for unseen databases. The root causes for this problem are complex and they cannot be easily fixed by adding more manually created training. In this paper we address the problem and propose a solution that is a hybrid system using automated training-data augmentation technique. Our system consists of a rule-based and a deep learning components that interact to understand crucial information in a given query and produce correct SQL as a result. It achieves double-digit percentage improvement for databases that are not part of the Spider corpus.

2021

pdf bib
Recognizing and Splitting Conditional Sentences for Automation of Business Processes Management
Ngoc Phuoc An Vo | Irene Manotas | Octavian Popescu | Algimantas Černiauskas | Vadim Sheinin
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Business Process Management (BPM) is the discipline which is responsible for management of discovering, analyzing, redesigning, monitoring, and controlling business processes. One of the most crucial tasks of BPM is discovering and modelling business processes from text documents. In this paper, we present our system that resolves an end-to-end problem consisting of 1) recognizing conditional sentences from technical documents, 2) finding boundaries to extract conditional and resultant clauses from each conditional sentence, and 3) categorizing resultant clause as Action or Consequence which later helps to generate new steps in our business process model automatically. We created a new dataset and three models to solve this problem. Our best model achieved very promising results of 83.82, 87.84, and 85.75 for Precision, Recall, and F1, respectively, for extracting Condition, Action, and Consequence clauses using Exact Match metric.

2020

pdf bib
Identifying Motion Entities in Natural Language and A Case Study for Named Entity Recognition
Ngoc Phuoc An Vo | Irene Manotas | Vadim Sheinin | Octavian Popescu
Proceedings of the 28th International Conference on Computational Linguistics

Motion recognition is one of the basic cognitive capabilities of many life forms, however, detecting and understanding motion in text is not a trivial task. In addition, identifying motion entities in natural language is not only challenging but also beneficial for a better natural language understanding. In this paper, we present a Motion Entity Tagging (MET) model to identify entities in motion in a text using the Literal-Motion-in-Text (LiMiT) dataset for training and evaluating the model. Then we propose a new method to split clauses and phrases from complex and long motion sentences to improve the performance of our MET model. We also present results showing that motion features, in particular, entity in motion benefits the Named-Entity Recognition (NER) task. Finally, we present an analysis for the special co-occurrence relation between the person category in NER and animate entities in motion, which significantly improves the classification performance for the person category in NER.

pdf bib
LiMiT: The Literal Motion in Text Dataset
Irene Manotas | Ngoc Phuoc An Vo | Vadim Sheinin
Findings of the Association for Computational Linguistics: EMNLP 2020

Motion recognition is one of the basic cognitive capabilities of many life forms, yet identifying motion of physical entities in natural language have not been explored extensively and empirically. We present the Literal-Motion-in-Text (LiMiT) dataset, a large human-annotated collection of English text sentences describing physical occurrence of motion, with annotated physical entities in motion. We describe the annotation process for the dataset, analyze its scale and diversity, and report results of several baseline models. We also present future research directions and applications of the LiMiT dataset and share it publicly as a new resource for the research community.

2018

pdf bib
A Large Resource of Patterns for Verbal Paraphrases
Octavian Popescu | Ngoc Phuoc An Vo | Vadim Sheinin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
QUEST: A Natural Language Interface to Relational Databases
Vadim Sheinin | Elahe Khorashani | Hangu Yeo | Kun Xu | Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Corpora for Learning the Mutual Relationship between Semantic Relatedness and Textual Entailment
Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present the creation of a corpora annotated with both semantic relatedness (SR) scores and textual entailment (TE) judgments. In building this corpus we aimed at discovering, if any, the relationship between these two tasks for the mutual benefit of resolving one of them by relying on the insights gained from the other. We considered a corpora already annotated with TE judgments and we proceed to the manual annotation with SR scores. The RTE 1-4 corpora used in the PASCAL competition fit our need. The annotators worked independently of one each other and they did not have access to the TE judgment during annotation. The intuition that the two annotations are correlated received major support from this experiment and this finding led to a system that uses this information to revise the initial estimates of SR scores. As semantic relatedness is one of the most general and difficult task in natural language processing we expect that future systems will combine different sources of information in order to solve it. Our work suggests that textual entailment plays a quantifiable role in addressing it.

pdf bib
DISCO: A System Leveraging Semantic Search in Document Review
Ngoc Phuoc An Vo | Fabien Guillot | Caroline Privault
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

This paper presents Disco, a prototype for supporting knowledge workers in exploring, reviewing and sorting collections of textual data. The goal is to facilitate, accelerate and improve the discovery of information. To this end, it combines Semantic Relatedness techniques with a review workflow developed in a tangible environment. Disco uses a semantic model that is leveraged on-line in the course of search sessions, and accessed through natural hand-gesture, in a simple and intuitive way.

2015

pdf bib
Paraphrase Identification and Semantic Similarity in Twitter with Simple Features
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the third International Workshop on Natural Language Processing for Social Media

pdf bib
Learning the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity
Simone Magnolini | Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Learning the Impact and Behavior of Syntactic Structure: A Case Study in Semantic Textual Similarity
Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
FBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
FBK-HLT: A New Framework for Semantic Textual Similarity
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
FBK-HLT: An Application of Semantic Textual Similarity for Answer Selection in Community Question Answering
Ngoc Phuoc An Vo | Simone Magnolini | Octavian Popescu
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Preliminary Evaluation of the Impact of Syntactic Structure in Semantic Textual Similarity and Semantic Relatedness Tasks
Ngoc Phuoc An Vo | Octavian Popescu
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2014

pdf bib
Fast and Accurate Misspelling Correction in Large Corpora
Octavian Popescu | Ngoc Phuoc An Vo
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
FBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity
Ngoc Phuoc An Vo | Tommaso Caselli | Octavian Popescu
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE
Ngoc Phuoc An Vo | Octavian Popescu | Tommaso Caselli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)