Vikas Yadav


pdf bib
If You Want to Go Far Go Together: Unsupervised Joint Candidate Evidence Retrieval for Multi-hop Question Answering
Vikas Yadav | Steven Bethard | Mihai Surdeanu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Multi-hop reasoning requires aggregation and inference from multiple facts. To retrieve such facts, we propose a simple approach that retrieves and reranks set of evidence facts jointly. Our approach first generates unsupervised clusters of sentences as candidate evidence by accounting links between sentences and coverage with the given query. Then, a RoBERTa-based reranker is trained to bring the most representative evidence cluster to the top. We specifically emphasize on the importance of retrieving evidence jointly by showing several comparative analyses to other methods that retrieve and rerank evidence sentences individually. First, we introduce several attention- and embedding-based analyses, which indicate that jointly retrieving and reranking approaches can learn compositional knowledge required for multi-hop reasoning. Second, our experiments show that jointly retrieving candidate evidence leads to substantially higher evidence retrieval performance when fed to the same supervised reranker. In particular, our joint retrieval and then reranking approach achieves new state-of-the-art evidence retrieval performance on two multi-hop question answering (QA) datasets: 30.5 Recall@2 on QASC, and 67.6% F1 on MultiRC. When the evidence text from our joint retrieval approach is fed to a RoBERTa-based answer selection classifier, we achieve new state-of-the-art QA performance on MultiRC and second best result on QASC.


pdf bib
Multi-class Hierarchical Question Classification for Multiple Choice Science Exams
Dongfang Xu | Peter Jansen | Jaycie Martin | Zhengnan Xie | Vikas Yadav | Harish Tayyar Madabushi | Oyvind Tafjord | Peter Clark
Proceedings of the 12th Language Resources and Evaluation Conference

Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately. However, developing strong QC algorithms has been hindered by the limited size and complexity of annotated data available. To address this, we present the largest challenge dataset for QC, containing 7,787 science exam questions paired with detailed classification labels from a fine-grained hierarchical taxonomy of 406 problem domains. We then show that a BERT-based model trained on this dataset achieves a large (+0.12 MAP) gain compared with previous methods, while also achieving state-of-the-art performance on benchmark open-domain and biomedical QC datasets. Finally, we show that using this model’s predictions of question topic significantly improves the accuracy of a question answering system by +1.7% P@1, with substantial future gains possible as QC performance improves.

pdf bib
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Vikas Yadav | Steven Bethard | Mihai Surdeanu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Evidence retrieval is a critical stage of question answering (QA), necessary not only to improve performance, but also to explain the decisions of the QA method. We introduce a simple, fast, and unsupervised iterative evidence retrieval method, which relies on three ideas: (a) an unsupervised alignment approach to soft-align questions and answers with justification sentences using only GloVe embeddings, (b) an iterative process that reformulates queries focusing on terms that are not covered by existing justifications, which (c) stops when the terms in the given question and candidate answers are covered by the retrieved justifications. Despite its simplicity, our approach outperforms all the previous methods (including supervised methods) on the evidence selection task on two datasets: MultiRC and QASC. When these evidence sentences are fed into a RoBERTa answer classification component, we achieve state-of-the-art QA performance on these two datasets.


pdf bib
Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering
Vikas Yadav | Steven Bethard | Mihai Surdeanu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose an unsupervised strategy for the selection of justification sentences for multi-hop question answering (QA) that (a) maximizes the relevance of the selected sentences, (b) minimizes the overlap between the selected facts, and (c) maximizes the coverage of both question and answer. This unsupervised sentence selection can be coupled with any supervised QA model. We show that the sentences selected by our method improve the performance of a state-of-the-art supervised QA model on two multi-hop QA datasets: AI2’s Reasoning Challenge (ARC) and Multi-Sentence Reading Comprehension (MultiRC). We obtain new state-of-the-art performance on both datasets among systems that do not use external resources for training the QA system: 56.82% F1 on ARC (41.24% on Challenge and 64.49% on Easy) and 26.1% EM0 on MultiRC. Our justification sentences have higher quality than the justifications selected by a strong information retrieval baseline, e.g., by 5.4% F1 in MultiRC. We also show that our unsupervised selection of justification sentences is more stable across domains than a state-of-the-art supervised sentence selection method.

pdf bib
Alignment over Heterogeneous Embeddings for Question Answering
Vikas Yadav | Steven Bethard | Mihai Surdeanu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a simple, fast, and mostly-unsupervised approach for non-factoid question answering (QA) called Alignment over Heterogeneous Embeddings (AHE). AHE simply aligns each word in the question and candidate answer with the most similar word in the retrieved supporting paragraph, and weighs each alignment score with the inverse document frequency of the corresponding question/answer term. AHE’s similarity function operates over embeddings that model the underlying text at different levels of abstraction: character (FLAIR), word (BERT and GloVe), and sentence (InferSent), where the latter is the only supervised component in the proposed approach. Despite its simplicity and lack of supervision, AHE obtains a new state-of-the-art performance on the “Easy” partition of the AI2 Reasoning Challenge (ARC) dataset (64.6% accuracy), top-two performance on the “Challenge” partition of ARC (34.1%), and top-three performance on the WikiQA dataset (74.08% MRR), outperforming many other complex, supervised approaches. Our error analysis indicates that alignments over character, word, and sentence embeddings capture substantially different semantic information. We exploit this with a simple meta-classifier that learns how much to trust the predictions over each representation, which further improves the performance of unsupervised AHE.

pdf bib
Eidos, INDRA, & Delphi: From Free Text to Executable Causal Models
Rebecca Sharp | Adarsh Pyarelal | Benjamin Gyori | Keith Alcock | Egoitz Laparra | Marco A. Valenzuela-Escárcega | Ajay Nagesh | Vikas Yadav | John Bachman | Zheng Tang | Heather Lent | Fan Luo | Mithun Paul | Steven Bethard | Kobus Barnard | Clayton Morrison | Mihai Surdeanu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

Building causal models of complicated phenomena such as food insecurity is currently a slow and labor-intensive manual process. In this paper, we introduce an approach that builds executable probabilistic models from raw, free text. The proposed approach is implemented through three systems: Eidos, INDRA, and Delphi. Eidos is an open-domain machine reading system designed to extract causal relations from natural language. It is rule-based, allowing for rapid domain transfer, customizability, and interpretability. INDRA aggregates multiple sources of causal information and performs assembly to create a coherent knowledge base and assess its reliability. This assembled knowledge serves as the starting point for modeling. Delphi is a modeling framework that assembles quantified causal fragments and their contexts into executable probabilistic models that respect the semantics of the original text, and can be used to support decision making.

pdf bib
University of Arizona at SemEval-2019 Task 12: Deep-Affix Named Entity Recognition of Geolocation Entities
Vikas Yadav | Egoitz Laparra | Ti-Tai Wang | Mihai Surdeanu | Steven Bethard
Proceedings of the 13th International Workshop on Semantic Evaluation

We present the Named Entity Recognition (NER) and disambiguation model used by the University of Arizona team (UArizona) for the SemEval 2019 task 12. We achieved fourth place on tasks 1 and 3. We implemented a deep-affix based LSTM-CRF NER model for task 1, which utilizes only character, word, pre- fix and suffix information for the identification of geolocation entities. Despite using just the training data provided by task organizers and not using any lexicon features, we achieved 78.85% strict micro F-score on task 1. We used the unsupervised population heuristics for task 3 and achieved 52.99% strict micro-F1 score in this task.


pdf bib
A Survey on Recent Advances in Named Entity Recognition from Deep Learning models
Vikas Yadav | Steven Bethard
Proceedings of the 27th International Conference on Computational Linguistics

Named Entity Recognition (NER) is a key component in NLP systems for question answering, information retrieval, relation extraction, etc. NER systems have been studied and developed widely for decades, but accurate systems using deep neural networks (NN) have only been introduced in the last few years. We present a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms. Our results highlight the improvements achieved by neural networks, and show how incorporating some of the lessons learned from past work on feature-based NER systems can yield further improvements.

pdf bib
Deep Affix Features Improve Neural Named Entity Recognizers
Vikas Yadav | Rebecca Sharp | Steven Bethard
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

We propose a practical model for named entity recognition (NER) that combines word and character-level information with a specific learned representation of the prefixes and suffixes of the word. We apply this approach to multilingual and multi-domain NER and show that it achieves state of the art results on the CoNLL 2002 Spanish and Dutch and CoNLL 2003 German NER datasets, consistently achieving 1.5-2.3 percent over the state of the art without relying on any dictionary features. Additionally, we show improvement on SemEval 2013 task 9.1 DrugNER, achieving state of the art results on the MedLine dataset and the second best results overall (-1.3% from state of the art). We also establish a new benchmark on the I2B2 2010 Clinical NER dataset with 84.70 F-score.