Andrew McCallum


2021

pdf bib
Knowledge Informed Semantic Parsing for Conversational Question Answering
Raghuveer Thirukovalluru | Mukund Sridhar | Dung Thai | Shruti Chanumolu | Nicholas Monath | Sankaranarayanan Ananthakrishnan | Andrew McCallum
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Smart assistants are tasked to answer various questions regarding world knowledge. These questions range from retrieval of simple facts to retrieval of complex, multi-hops question followed by various operators (i.e., filter, argmax). Semantic parsing has emerged as the state-of-the-art for answering these kinds of questions by forming queries to extract information from knowledge bases (KBs). Specially, neural semantic parsers (NSPs) effectively translate natural questions to logical forms, which execute on KB and give desirable answers. Yet, NSPs suffer from non-executable logical forms for some instances in the generated logical forms might be missing due to the incompleteness of KBs. Intuitively, knowing the KB structure informs NSP with changes of the global logical forms structures with respect to changes in KB instances. In this work, we propose a novel knowledge-informed decoder variant of NSP. We consider the conversational question answering settings, where a natural language query, its context and its final answers are available at training. Experimental results show that our method outperformed strong baselines by 1.8 F1 points overall across 10 types of questions of the CSQA dataset. Especially for the “Logical Reasoning” category, our model improves by 7 F1 points. Furthermore, our results are achieved with 90.3% fewer parameters, allowing faster training for large-scale datasets.

pdf bib
Simultaneously Self-Attending to Text and Entities for Knowledge-Informed Text Representations
Dung Thai | Raghuveer Thirukovalluru | Trapit Bansal | Andrew McCallum
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Pre-trained language models have emerged as highly successful methods for learning good text representations. However, the amount of structured knowledge retained in such models, and how (if at all) it can be extracted, remains an open question. In this work, we aim at directly learning text representations which leverage structured knowledge about entities mentioned in the text. This can be particularly beneficial for downstream tasks which are knowledge-intensive. Our approach utilizes self-attention between words in the text and knowledge graph (KG) entities mentioned in the text. While existing methods require entity-linked data for pre-training, we train using a mention-span masking objective and a candidate ranking objective – which doesn’t require any entity-links and only assumes access to an alias table for retrieving candidates, enabling large-scale pre-training. We show that the proposed model learns knowledge-informed text representations that yield improvements on the downstream tasks over existing methods.

pdf bib
Box-To-Box Transformations for Modeling Joint Hierarchies
Shib Sankar Dasgupta | Xiang Lorraine Li | Michael Boratko | Dongxu Zhang | Andrew McCallum
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Learning representations of entities and relations in structured knowledge bases is an active area of research, with much emphasis placed on choosing the appropriate geometry to capture the hierarchical structures exploited in, for example, isa or haspart relations. Box embeddings (Vilnis et al., 2018; Li et al., 2019; Dasgupta et al., 2020), which represent concepts as n-dimensional hyperrectangles, are capable of embedding hierarchies when training on a subset of the transitive closure. In Patel et al., (2020), the authors demonstrate that only the transitive reduction is required and further extend box embeddings to capture joint hierarchies by augmenting the graph with new nodes. While it is possible to represent joint hierarchies with this method, the parameters for each hierarchy are decoupled, making generalization between hierarchies infeasible. In this work, we introduce a learned box-to-box transformation that respects the structure of each hierarchy. We demonstrate that this not only improves the capability of modeling cross-hierarchy compositional edges but is also capable of generalizing from a subset of the transitive reduction.

pdf bib
Multi-facet Universal Schema
Rohan Paul | Haw-Shiuan Chang | Andrew McCallum
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Universal schema (USchema) assumes that two sentence patterns that share the same entity pairs are similar to each other. This assumption is widely adopted for solving various types of relation extraction (RE) tasks. Nevertheless, each sentence pattern could contain multiple facets, and not every facet is similar to all the facets of another sentence pattern co-occurring with the same entity pair. To address the violation of the USchema assumption, we propose multi-facet universal schema that uses a neural model to represent each sentence pattern as multiple facet embeddings and encourage one of these facet embeddings to be close to that of another sentence pattern if they co-occur with the same entity pair. In our experiments, we demonstrate that multi-facet embeddings significantly outperform their single-facet embedding counterpart, compositional universal schema (CUSchema) (Verga et al., 2016), in distantly supervised relation extraction tasks. Moreover, we can also use multiple embeddings to detect the entailment relation between two sentence patterns when no manual label is available.

pdf bib
Changing the Mind of Transformers for Topically-Controllable Language Generation
Haw-Shiuan Chang | Jiaming Yuan | Mohit Iyyer | Andrew McCallum
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Large Transformer-based language models can aid human authors by suggesting plausible continuations of text written so far. However, current interactive writing assistants do not allow authors to guide text generation in desired topical directions. To address this limitation, we design a framework that displays multiple candidate upcoming topics, of which a user can select a subset to guide the generation. Our framework consists of two components: (1) a method that produces a set of candidate topics by predicting the centers of word clusters in the possible continuations, and (2) a text generation model whose output adheres to the chosen topics. The training of both components is self-supervised, using only unlabeled text. Our experiments demonstrate that our topic options are better than those of standard clustering approaches, and our framework often generates fluent sentences related to the chosen topics, as judged by automated metrics and crowdsourced workers.

pdf bib
Probabilistic Box Embeddings for Uncertain Knowledge Graph Reasoning
Xuelu Chen | Michael Boratko | Muhao Chen | Shib Sankar Dasgupta | Xiang Lorraine Li | Andrew McCallum
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Knowledge bases often consist of facts which are harvested from a variety of sources, many of which are noisy and some of which conflict, resulting in a level of uncertainty for each triple. Knowledge bases are also often incomplete, prompting the use of embedding methods to generalize from known facts, however, existing embedding methods only model triple-level uncertainty, and reasoning results lack global consistency. To address these shortcomings, we propose BEUrRE, a novel uncertain knowledge graph embedding method with calibrated probabilistic semantics. BEUrRE models each entity as a box (i.e. axis-aligned hyperrectangle) and relations between two entities as affine transforms on the head and tail entity boxes. The geometry of the boxes allows for efficient calculation of intersections and volumes, endowing the model with calibrated probabilistic semantics and facilitating the incorporation of relational constraints. Extensive experiments on two benchmark datasets show that BEUrRE consistently outperforms baselines on confidence prediction and fact ranking due to its probabilistic calibration and ability to capture high-order dependencies among facts.

pdf bib
Clustering-based Inference for Biomedical Entity Linking
Rico Angell | Nicholas Monath | Sunil Mohan | Nishant Yadav | Andrew McCallum
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates entity linking models which are able to link mentions of unseen entities using learned representations of entities. Previous approaches link each mention independently, ignoring the relationships within and across documents between the entity mentions. These relations can be very useful for linking mentions in biomedical text where linking decisions are often difficult due mentions having a generic or a highly specialized form. In this paper, we introduce a model in which linking decisions can be made not merely by linking to a knowledge base entity but also by grouping multiple mentions together via clustering and jointly making linking predictions. In experiments on the largest publicly available biomedical dataset, we improve the best independent prediction for entity linking by 3.0 points of accuracy, and our clustering-based inference model further improves entity linking by 2.3 points.

pdf bib
Modeling Fine-Grained Entity Types with Box Embeddings
Yasumasa Onoe | Michael Boratko | Andrew McCallum | Greg Durrett
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Neural entity typing models typically represent fine-grained entity types as vectors in a high-dimensional space, but such spaces are not well-suited to modeling these types’ complex interdependencies. We study the ability of box embeddings, which embed concepts as d-dimensional hyperrectangles, to capture hierarchies of types even when these relationships are not defined explicitly in the ontology. Our model represents both types and entity mentions as boxes. Each mention and its context are fed into a BERT-based model to embed that mention in our box space; essentially, this model leverages typological clues present in the surface text to hypothesize a type representation for the mention. Box containment can then be used to derive both the posterior probability of a mention exhibiting a given type and the conditional probability relations between types themselves. We compare our approach with a vector-based typing model and observe state-of-the-art performance on several entity typing benchmarks. In addition to competitive typing performance, our box-based model shows better performance in prediction consistency (predicting a supertype and a subtype together) and confidence (i.e., calibration), demonstrating that the box-based model captures the latent type hierarchies better than the vector-based model does.

pdf bib
Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
Sumanta Bhattacharyya | Amirmohammad Rooshenas | Subhajit Naskar | Simeng Sun | Mohit Iyyer | Andrew McCallum
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task measure, we notice that the samples drawn from an MLE-based trained NMT support the desired distribution – there are samples with much higher BLEU score comparing to the beam decoding output. To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i.e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted in a re-ranking algorithm based on the samples drawn from NMT: energy-based re-ranking (EBR). We use both marginal energy models (over target sentence) and joint energy models (over both source and target sentences). Our EBR with the joint energy model consistently improves the performance of the Transformer-based NMT: +3.7 BLEU points on IWSLT’14 German-English, +3.37 BELU points on Sinhala-English, +1.4 BLEU points on WMT’16 English-German tasks.

pdf bib
Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference
Robert L Logan IV | Andrew McCallum | Sameer Singh | Dan Bikel
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Streaming cross document entity coreference (CDC) systems disambiguate mentions of named entities in a scalable manner via incremental clustering. Unlike other approaches for named entity disambiguation (e.g., entity linking), streaming CDC allows for the disambiguation of entities that are unknown at inference time. Thus, it is well-suited for processing streams of data where new entities are frequently introduced. Despite these benefits, this task is currently difficult to study, as existing approaches are either evaluated on datasets that are no longer available, or omit other crucial details needed to ensure fair comparison. In this work, we address this issue by compiling a large benchmark adapted from existing free datasets, and performing a comprehensive evaluation of a number of novel and existing baseline models. We investigate: how to best encode mentions, which clustering algorithms are most effective for grouping mentions, how models transfer to different domains, and how bounding the number of mentions tracked during inference impacts performance. Our results show that the relative performance of neural and feature-based mention encoders varies across different domains, and in most cases the best performance is achieved using a combination of both approaches. We also find that performance is minimally impacted by limiting the number of tracked mentions.

pdf bib
MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network
Nicholas FitzGerald | Dan Bikel | Jan Botha | Daniel Gillick | Tom Kwiatkowski | Andrew McCallum
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as “class prototypes” as inference involves retrieving from the full set of labeled entity mentions in the training set and applying the nearest mention neighbor’s entity label. Our model is trained on a large multilingual corpus of mention pairs derived from Wikipedia hyperlinks, and performs nearest neighbor inference on an index of 700 million mentions. It is simpler to train, gives more interpretable predictions, and outperforms all other systems on two multilingual entity linking benchmarks.

pdf bib
Long Document Summarization in a Low Resource Setting using Pretrained Language Models
Ahsaas Bajaj | Pavitra Dangati | Kalpesh Krishna | Pradhiksha Ashok Kumar | Rheeya Uppaal | Bradford Windsor | Eliot Brenner | Dominic Dotterrer | Rajarshi Das | Andrew McCallum
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

Abstractive summarization is the task of compressing a long document into a coherent short document while retaining salient information. Modern abstractive summarization methods are based on deep neural networks which often require large training datasets. Since collecting summarization datasets is an expensive and time-consuming task, practical industrial settings are usually low-resource. In this paper, we study a challenging low-resource setting of summarizing long legal briefs with an average source document length of 4268 words and only 120 available (document, summary) pairs. To account for data scarcity, we used a modern pre-trained abstractive summarizer BART, which only achieves 17.9 ROUGE-L as it struggles with long documents. We thus attempt to compress these long documents by identifying salient sentences in the source which best ground the summary, using a novel algorithm based on GPT-2 language model perplexity scores, that operates within the low resource regime. On feeding the compressed documents to BART, we observe a 6.0 ROUGE-L improvement. Our method also beats several competitive salience detection baselines. Furthermore, the identified salient sentences tend to agree with independent human labeling by domain experts.

pdf bib
Scaling Within Document Coreference to Long Texts
Raghuveer Thirukovalluru | Nicholas Monath | Kumar Shridhar | Manzil Zaheer | Mrinmaya Sachan | Andrew McCallum
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks
Trapit Bansal | Rishikesh Jha | Andrew McCallum
Proceedings of the 28th International Conference on Computational Linguistics

Pre-trained transformer models have shown enormous success in improving performance on several downstream tasks. However, fine-tuning on a new task still requires large amounts of task-specific labeled data to achieve good performance. We consider this problem of learning to generalize to new tasks, with a few examples, as a meta-learning problem. While meta-learning has shown tremendous progress in recent years, its application is still limited to simulated problems or problems with limited diversity across tasks. We develop a novel method, LEOPARD, which enables optimization-based meta-learning across tasks with a different number of classes, and evaluate different methods on generalization to diverse NLP classification tasks. LEOPARD is trained with the state-of-the-art transformer architecture and shows better generalization to tasks not seen at all during training, with as few as 4 examples per label. Across 17 NLP tasks, including diverse domains of entity typing, natural language inference, sentiment analysis, and several other text classification tasks, we show that LEOPARD learns better initial parameters for few-shot learning than self-supervised pre-training or multi-task training, outperforming many strong baselines, for example, yielding 14.6% average relative gain in accuracy on unseen tasks with only 4 examples per label.

pdf bib
An Instance Level Approach for Shallow Semantic Parsing in Scientific Procedural Text
Daivik Swarup | Ahsaas Bajaj | Sheshera Mysore | Tim O’Gorman | Rajarshi Das | Andrew McCallum
Findings of the Association for Computational Linguistics: EMNLP 2020

In specific domains, such as procedural scientific text, human labeled data for shallow semantic parsing is especially limited and expensive to create. Fortunately, such specific domains often use rather formulaic writing, such that the different ways of expressing relations in a small number of grammatically similar labeled sentences may provide high coverage of semantic structures in the corpus, through an appropriately rich similarity metric. In light of this opportunity, this paper explores an instance-based approach to the relation prediction sub-task within shallow semantic parsing, in which semantic labels from structurally similar sentences in the training set are copied to test sentences. Candidate similar sentences are retrieved using SciBERT embeddings. For labels where it is possible to copy from a similar sentence we employ an instance level copy network, when this is not possible, a globally shared parametric model is employed. Experiments show our approach outperforms both baseline and prior methods by 0.75 to 3 F1 absolute in the Wet Lab Protocol Corpus and 1 F1 absolute in the Materials Science Procedural Text Corpus.

pdf bib
Probabilistic Case-based Reasoning for Open-World Knowledge Graph Completion
Rajarshi Das | Ameya Godbole | Nicholas Monath | Manzil Zaheer | Andrew McCallum
Findings of the Association for Computational Linguistics: EMNLP 2020

A case-based reasoning (CBR) system solves a new problem by retrieving ‘cases’ that are similar to the given problem. If such a system can achieve high accuracy, it is appealing owing to its simplicity, interpretability, and scalability. In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs). Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB. Our probabilistic model estimates the likelihood that a path is effective at answering a query about the given entity. The parameters of our model can be efficiently computed using simple path statistics and require no iterative optimization. Our model is non-parametric, growing dynamically as new entities and relations are added to the KB. On several benchmark datasets our approach significantly outperforms other rule learning approaches and performs comparably to state-of-the-art embedding-based approaches. Furthermore, we demonstrate the effectiveness of our model in an “open-world” setting where new entities arrive in an online fashion, significantly outperforming state-of-the-art approaches and nearly matching the best offline method.

pdf bib
Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks
Trapit Bansal | Rishikesh Jha | Tsendsuren Munkhdalai | Andrew McCallum
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning. However, fine-tuning is still data inefficient — when there are few labeled examples, accuracy can be low. Data efficiency can be improved by optimizing pre-training directly for future fine-tuning with few examples; this can be treated as a meta-learning problem. However, standard meta-learning techniques require many training tasks in order to generalize; unfortunately, finding a diverse set of such supervised tasks is usually difficult. This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. This is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms. This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework. On 17 NLP tasks, we show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning. Furthermore, we show how the self-supervised tasks can be combined with supervised tasks for meta-learning, providing substantial accuracy gains over previous supervised meta-learning.

pdf bib
ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning
Michael Boratko | Xiang Li | Tim O’Gorman | Rajarshi Das | Dan Le | Andrew McCallum
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Given questions regarding some prototypical situation — such as Name something that people usually do before they leave the house for work? — a human can easily answer them via acquired experiences. There can be multiple right answers for such questions, with some more common for a situation than others. This paper introduces a new question answering dataset for training and evaluating common sense reasoning capabilities of artificial intelligence systems in such prototypical situations. The training set is gathered from an existing set of questions played in a long-running international trivia game show – Family Feud. The hidden evaluation set is created by gathering answers for each question from 100 crowd-workers. We also propose a generative evaluation task where a model has to output a ranked list of answers, ideally covering all prototypical answers for a question. After presenting multiple competitive baseline models, we find that human performance still exceeds model scores on all evaluation metrics with a meaningful gap, supporting the challenging nature of the task.

pdf bib
Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders
Andrew Drozdov | Subendhu Rongali | Yi-Pei Chen | Tim O’Gorman | Mohit Iyyer | Andrew McCallum
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The deep inside-outside recursive autoencoder (DIORA; Drozdov et al. 2019) is a self-supervised neural model that learns to induce syntactic tree structures for input sentences *without access to labeled training data*. In this paper, we discover that while DIORA exhaustively encodes all possible binary trees of a sentence with a soft dynamic program, its vector averaging approach is locally greedy and cannot recover from errors when computing the highest scoring parse tree in bottom-up chart parsing. To fix this issue, we introduce S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart. Our experiments show that through *fine-tuning* a pre-trained DIORA with our new algorithm, we improve the state of the art in *unsupervised* constituency parsing on the English WSJ Penn Treebank by 2.2-6% F1, depending on the data used for fine-tuning.

2019

pdf bib
Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders
Andrew Drozdov | Patrick Verga | Yi-Pei Chen | Mohit Iyyer | Andrew McCallum
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Understanding text often requires identifying meaningful constituent spans such as noun phrases and verb phrases. In this work, we show that we can effectively recover these types of labels using the learned phrase vectors from deep inside-outside recursive autoencoders (DIORA). Specifically, we cluster span representations to induce span labels. Additionally, we improve the model’s labeling accuracy by integrating latent code learning into the training procedure. We evaluate this approach empirically through unsupervised labeled constituency parsing. Our method outperforms ELMo and BERT on two versions of the Wall Street Journal (WSJ) dataset and is competitive to prior work that requires additional human annotations, improving over a previous state-of-the-art system that depends on ground-truth part-of-speech tags by 5 absolute F1 points (19% relative error reduction).

pdf bib
Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference
Rajarshi Das | Ameya Godbole | Manzil Zaheer | Shehzaad Dhuliawala | Andrew McCallum
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

This paper describes our submission to the shared task on “Multi-hop Inference Explanation Regeneration” in TextGraphs workshop at EMNLP 2019 (Jansen and Ustalov, 2019). Our system identifies chains of facts relevant to explain an answer to an elementary science examination question. To counter the problem of ‘spurious chains’ leading to ‘semantic drifts’, we train a ranker that uses contextualized representation of facts to score its relevance for explaining an answer to a question. Our system was ranked first w.r.t the mean average precision (MAP) metric outperforming the second best system by 14.95 points.

pdf bib
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering
Rajarshi Das | Ameya Godbole | Dilip Kavarthapu | Zhiyu Gong | Abhishek Singhal | Mo Yu | Xiaoxiao Guo | Tian Gao | Hamed Zamani | Manzil Zaheer | Andrew McCallum
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Multi-hop question answering (QA) requires an information retrieval (IR) system that can find multiple supporting evidence needed to answer the question, making the retrieval process very challenging. This paper introduces an IR technique that uses information of entities present in the initially retrieved evidence to learn to ‘hop’ to other relevant evidence. In a setting, with more than 5 million Wikipedia paragraphs, our approach leads to significant boost in retrieval performance. The retrieved evidence also increased the performance of an existing QA model (without any training) on the benchmark by 10.59 F1.

pdf bib
OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference
Dongxu Zhang | Subhabrata Mukherjee | Colin Lockard | Luna Dong | Andrew McCallum
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embedding for (subject, object) pairs, thus cannot handle pairs absent in any existing triples; or they perform predicate-level mapping and completely ignore background evidence from individual entities, thus cannot achieve satisfying quality. We propose OpenKI to handle sparsity of OpenIE extractions by performing instance-level inference: for each entity, we encode the rich information in its neighborhood in both KB and OpenIE extractions, and leverage this information in relation inference by exploring different methods of aggregation and attention. In order to handle unseen entities, our model is designed without creating entity-specific parameters. Extensive experiments show that this method not only significantly improves state-of-the-art for conventional OpenIE extractions like ReVerb, but also boosts the performance on OpenIE from semi-structured data, where new entity pairs are abundant and data are fairly sparse.

pdf bib
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders
Andrew Drozdov | Patrick Verga | Mohit Yadav | Mohit Iyyer | Andrew McCallum
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce the deep inside-outside recursive autoencoder (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence. During training we use dynamic programming to consider all possible binary trees over the sentence, and for inference we use the CKY algorithm to extract the highest scoring parse. DIORA outperforms previously reported results for unsupervised binary constituency parsing on the benchmark WSJ dataset.

pdf bib
Roll Call Vote Prediction with Knowledge Augmented Models
Pallavi Patil | Kriti Myer | Ronak Zala | Arpit Singh | Sheshera Mysore | Andrew McCallum | Adrian Benton | Amanda Stent
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

The official voting records of United States congresspeople are preserved as roll call votes. Prediction of voting behavior of politicians for whom no voting record exists, such as individuals running for office, is important for forecasting key political decisions. Prior work has relied on past votes cast to predict future votes, and thus fails to predict voting patterns for politicians without voting records. We address this by augmenting a prior state of the art model with multiple sources of external knowledge so as to enable prediction on unseen politicians. The sources of knowledge we use are news text and Freebase, a manually curated knowledge base. We propose augmentations based on unigram features for news text, and a knowledge base embedding method followed by a neural network composition for relations from Freebase. Empirical evaluation of these approaches indicate that the proposed models outperform the prior system for politicians with complete historical voting records by 1.0% point of accuracy (8.7% error reduction) and for politicians without voting records by 33.4% points of accuracy (66.7% error reduction). We also show that the knowledge base augmented approach outperforms the news text augmented approach by 4.2% points of accuracy.

pdf bib
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
Vivi Nastase | Benjamin Roth | Laura Dietz | Andrew McCallum
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

pdf bib
The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures
Sheshera Mysore | Zachary Jensen | Edward Kim | Kevin Huang | Haw-Shiuan Chang | Emma Strubell | Jeffrey Flanigan | Andrew McCallum | Elsa Olivetti
Proceedings of the 13th Linguistic Annotation Workshop

Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.

pdf bib
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell | Ananya Ganesh | Andrew McCallum
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.

pdf bib
A2N: Attending to Neighbors for Knowledge Graph Inference
Trapit Bansal | Da-Cheng Juan | Sujith Ravi | Andrew McCallum
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

State-of-the-art models for knowledge graph completion aim at learning a fixed embedding representation of entities in a multi-relational graph which can generalize to infer unseen entity relationships at test time. This can be sub-optimal as it requires memorizing and generalizing to all possible entity relationships using these fixed representations. We thus propose a novel attention-based method to learn query-dependent representation of entities which adaptively combines the relevant graph neighborhood of an entity leading to more accurate KG completion. The proposed method is evaluated on two benchmark datasets for knowledge graph completion, and experimental results show that the proposed model performs competitively or better than existing state-of-the-art, including recent methods for explicit multi-hop reasoning. Qualitative probing offers insight into how the model can reason about facts involving multiple hops in the knowledge graph, through the use of neighborhood attention.

pdf bib
Optimal Transport-based Alignment of Learned Character Representations for String Similarity
Derek Tam | Nicholas Monath | Ari Kobren | Aaron Traylor | Rajarshi Das | Andrew McCallum
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE–a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE’s ability to detect whether two strings can refer to the same entity–a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE’s ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in Bˆ3 F1 over the previous state-of-the-art approach.

2018

pdf bib
Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection
Haw-Shiuan Chang | Ziyun Wang | Luke Vilnis | Andrew McCallum
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Modeling hypernymy, such as poodle is-a dog, is an important generalization aid to many NLP tasks, such as entailment, relation extraction, and question answering. Supervised learning from labeled hypernym sources, such as WordNet, limits the coverage of these models, which can be addressed by learning hypernyms from unlabeled text. Existing unsupervised methods either do not scale to large vocabularies or yield unacceptably poor accuracy. This paper introduces distributional inclusion vector embedding (DIVE), a simple-to-implement unsupervised method of hypernym discovery via per-word non-negative vector embeddings which preserve the inclusion property of word contexts. In experimental evaluations more comprehensive than any previous literature of which we are aware—evaluating on 11 datasets using multiple existing as well as newly proposed scoring functions—we find that our method provides up to double the precision of previous unsupervised methods, and the highest average performance, using a much more compact word representation, and yielding many new state-of-the-art results.

pdf bib
Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction
Patrick Verga | Emma Strubell | Andrew McCallum
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Most work in relation extraction forms a prediction by looking at a short span of text within a single sentence containing a single entity pair mention. This approach often does not consider interactions across mentions, requires redundant computation for each mention pair, and ignores relationships expressed across sentence boundaries. These problems are exacerbated by the document- (rather than sentence-) level annotation common in biological text. In response, we propose a model which simultaneously predicts relationships between all mention pairs in a document. We form pairwise predictions over entire paper abstracts using an efficient self-attention encoder. All-pairs mention scores allow us to perform multi-instance learning by aggregating over mentions to form entity pair representations. We further adapt to settings without mention-level annotation by jointly training to predict named entities and adding a corpus of weakly labeled data. In experiments on two Biocreative benchmark datasets, we achieve state of the art performance on the Biocreative V Chemical Disease Relation dataset for models without external KB resources. We also introduce a new dataset an order of magnitude larger than existing human-annotated biological information extraction datasets and more accurate than distantly supervised alternatives.

pdf bib
Training Structured Prediction Energy Networks with Indirect Supervision
Amirmohammad Rooshenas | Aishwarya Kamath | Andrew McCallum
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

This paper introduces rank-based training of structured prediction energy networks (SPENs). Our method samples from output structures using gradient descent and minimizes the ranking violation of the sampled structures with respect to a scalar scoring function defined with domain knowledge. We have successfully trained SPEN for citation field extraction without any labeled data instances, where the only source of supervision is a simple human-written scoring function. Such scoring functions are often easy to provide; the SPEN then furnishes an efficient structured prediction inference procedure.

pdf bib
Embedded-State Latent Conditional Random Fields for Sequence Labeling
Dung Thai | Sree Harsha Ramesh | Shikhar Murty | Luke Vilnis | Andrew McCallum
Proceedings of the 22nd Conference on Computational Natural Language Learning

Complex textual information extraction tasks are often posed as sequence labeling or shallow parsing, where fields are extracted using local labels made consistent through probabilistic inference in a graphical model with constrained transitions. Recently, it has become common to locally parametrize these models using rich features extracted by recurrent neural networks (such as LSTM), while enforcing consistent outputs through a simple linear-chain model, representing Markovian dependencies between successive labels. However, the simple graphical model structure belies the often complex non-local constraints between output labels. For example, many fields, such as a first name, can only occur a fixed number of times, or in the presence of other fields. While RNNs have provided increasingly powerful context-aware local features for sequence tagging, they have yet to be integrated with a global graphical model of similar expressivity in the output distribution. Our model goes beyond the linear chain CRF to incorporate multiple hidden states per output label, but parametrizes them parsimoniously with low-rank log-potential scoring matrices, effectively learning an embedding space for hidden states. This augmented latent space of inference variables complements the rich feature representation of the RNN, and allows exact global inference obeying complex, learned non-local output constraints. We experiment with several datasets and show that the model outperforms baseline CRF+RNN models when global output constraints are necessary at inference-time, and explore the interpretable latent structure.

pdf bib
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking
Shikhar Murty | Patrick Verga | Luke Vilnis | Irena Radovanovic | Andrew McCallum
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Extraction from raw text to a knowledge base of entities and fine-grained types is often cast as prediction into a flat set of entity and type labels, neglecting the rich hierarchies over types and entities contained in curated ontologies. Previous attempts to incorporate hierarchical structure have yielded little benefit and are restricted to shallow ontologies. This paper presents new methods using real and complex bilinear mappings for integrating hierarchical information, yielding substantial improvement over flat predictions in entity linking and fine-grained entity typing, and achieving new state-of-the-art results for end-to-end models on the benchmark FIGER dataset. We also present two new human-annotated datasets containing wide and deep hierarchies which we will release to the community to encourage further research in this direction: MedMentions, a collection of PubMed abstracts in which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet, which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k entity types. In experiments on all three datasets we show substantial gains from hierarchy-aware training.

pdf bib
Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures
Luke Vilnis | Xiang Li | Shikhar Murty | Andrew McCallum
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE), are a natural way to model transitive relational data (e.g. entailment graphs). However, OE learns a deterministic knowledge base, limiting expressiveness of queries and the ability to use uncertainty for both prediction and learning (e.g. learning from expectations). Probabilistic extensions of OE have provided the ability to somewhat calibrate these denotational probabilities while retaining the consistency and inductive bias of ordered models, but lack the ability to model the negative correlations found in real-world knowledge. In this work we show that a broad class of models that assign probability measures to OE can never capture negative correlation, which motivates our construction of a novel box lattice and accompanying probability measure to capture anti-correlation and even disjoint concepts, while still providing the benefits of probabilistic modeling, such as the ability to perform rich joint and conditional queries over arbitrary sets of concepts, and both learning from and predicting calibrated uncertainty. We show improvements over previous approaches in modeling the Flickr and WordNet entailment graphs, and investigate the power of the model.

pdf bib
Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings
Haw-Shiuan Chang | Amol Agrawal | Ananya Ganesh | Anirudha Desai | Vinayak Mathur | Alfred Hough | Andrew McCallum
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)

Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-the-art methods while being significantly more efficient.

pdf bib
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset
Michael Boratko | Harshit Padigela | Divyendra Mikkilineni | Pritish Yuvraj | Rajarshi Das | Andrew McCallum | Maria Chang | Achille Fokoue-Nkoutche | Pavan Kapanipathi | Nicholas Mattei | Ryan Musa | Kartik Talamadupula | Michael Witbrock
Proceedings of the Workshop on Machine Reading for Question Answering

The recent work of Clark et al. (2018) introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into easy and challenge sets. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the challenge set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.

pdf bib
Syntax Helps ELMo Understand Semantics: Is Syntax Still Relevant in a Deep Neural Architecture for SRL?
Emma Strubell | Andrew McCallum
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

Do unsupervised methods for learning rich, contextualized token representations obviate the need for explicit modeling of linguistic structure in neural network models for semantic role labeling (SRL)? We address this question by incorporating the massively successful ELMo embeddings (Peters et al., 2018) into LISA (Strubell and McCallum, 2018), a strong, linguistically-informed neural network architecture for SRL. In experiments on the CoNLL-2005 shared task we find that though ELMo out-performs typical word embeddings, beginning to close the gap in F1 between LISA with predicted and gold syntactic parses, syntactically-informed models still out-perform syntax-free models when both use ELMo, especially on out-of-domain data. Our results suggest that linguistic structures are indeed still relevant in this golden age of deep learning for NLP.

pdf bib
Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets
Nathan Greenberg | Trapit Bansal | Patrick Verga | Andrew McCallum
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Extracting typed entity mentions from text is a fundamental component to language understanding and reasoning. While there exist substantial labeled text datasets for multiple subsets of biomedical entity types—such as genes and proteins, or chemicals and diseases—it is rare to find large labeled datasets containing labels for all desired entity types together. This paper presents a method for training a single CRF extractor from multiple datasets with disjoint or partially overlapping sets of entity types. Our approach employs marginal likelihood training to insist on labels that are present in the data, while filling in “missing labels”. This allows us to leverage all the available data within a single model. In experimental results on the Biocreative V CDR (chemicals/diseases), Biocreative VI ChemProt (chemicals/proteins) and MedMentions (19 entity types) datasets, we show that joint training on multiple datasets improves NER F1 over training in isolation, and our methods achieve state-of-the-art results.

pdf bib
Linguistically-Informed Self-Attention for Semantic Role Labeling
Emma Strubell | Patrick Verga | Daniel Andor | David Weiss | Andrew McCallum
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Current state-of-the-art semantic role labeling (SRL) uses a deep neural network with no explicit linguistic features. However, prior work has shown that gold syntax trees can dramatically improve SRL decoding, suggesting the possibility of increased accuracy from explicit modeling of syntax. In this work, we present linguistically-informed self-attention (LISA): a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare linguistic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated by training one attention head to attend to syntactic parents for each token. Moreover, if a high-quality syntactic parse is already available, it can be beneficially injected at test time without re-training our SRL model. In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art performance for a model using predicted predicates and standard word embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 F1. LISA also out-performs the state-of-the-art with contextually-encoded (ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on out-of-domain text.

pdf bib
An Interface for Annotating Science Questions
Michael Boratko | Harshit Padigela | Divyendra Mikkilineni | Pritish Yuvraj | Rajarshi Das | Andrew McCallum | Maria Chang | Achille Fokoue | Pavan Kapanipathi | Nicholas Mattei | Ryan Musa | Kartik Talamadupula | Michael Witbrock
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Recent work introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That work includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them. However, it does not include clear definitions of these types, nor does it offer information about the quality of the labels or the annotation process used. In this paper, we introduce a novel interface for human annotation of science question-answer pairs with their respective knowledge and reasoning types, in order that the classification of new questions may be improved. We build on the classification schema proposed by prior work on the ARC dataset, and evaluate the effectiveness of our interface with a preliminary study involving 10 participants.

2017

pdf bib
Dependency Parsing with Dilated Iterated Graph CNNs
Emma Strubell | Andrew McCallum
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

Dependency parses are an effective way to inject linguistic knowledge into many downstream tasks, and many practitioners wish to efficiently parse sentences at scale. Recent advances in GPU hardware have enabled neural networks to achieve significant gains over the previous best models, these models still fail to leverage GPUs’ capability for massive parallelism due to their requirement of sequential processing of the sentence. In response, we propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for graph-based dependency parsing, a graph convolutional architecture that allows for efficient end-to-end GPU parsing. In experiments on the English Penn TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best neural network parsers.

pdf bib
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
Isabelle Augenstein | Mrinal Das | Sebastian Riedel | Lakshmi Vikraman | Andrew McCallum
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.

pdf bib
Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks
Rajarshi Das | Manzil Zaheer | Siva Reddy | Andrew McCallum
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Existing question answering methods infer answers either from a knowledge base or from raw text. While knowledge base (KB) methods are good at answering compositional questions, their performance is often affected by the incompleteness of the KB. Au contraire, web text contains millions of facts that are absent in the KB, however in an unstructured form. Universal schema can support reasoning on the union of both structured KBs and unstructured text by aligning them in a common embedded space. In this paper we extend universal schema to natural language question answering, employing Memory networks to attend to the large body of facts in the combination of text and KB. Our models can be trained in an end-to-end fashion on question-answer pairs. Evaluation results on Spades fill-in-the-blank question answering dataset show that exploiting universal schema for question answering is better than using either a KB or text alone. This model also outperforms the current state-of-the-art by 8.5 F1 points.

pdf bib
Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
Emma Strubell | Patrick Verga | David Belanger | Andrew McCallum
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time and energy costs. Recent advances in GPU hardware have led to the emergence of bi-directional LSTMs as a standard method for obtaining per-token vector representations serving as input to labeling tasks such as NER (often followed by prediction in a linear-chain CRF). Though expressive and accurate, these models fail to fully exploit GPU parallelism, limiting their computational efficiency. This paper proposes a faster alternative to Bi-LSTMs for NER: Iterated Dilated Convolutional Neural Networks (ID-CNNs), which have better capacity than traditional CNNs for large context and structured prediction. Unlike LSTMs whose sequential processing on sentences of length N requires O(N) time even in the face of parallelism, ID-CNNs permit fixed-depth convolutions to run in parallel across entire documents. We describe a distinct combination of network structure, parameter sharing and training procedures that enable dramatic 14-20x test-time speedups while retaining accuracy comparable to the Bi-LSTM-CRF. Moreover, ID-CNNs trained to aggregate context from the entire document are more accurate than Bi-LSTM-CRFs while attaining 8x faster test time speeds.

pdf bib
Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks
Rajarshi Das | Arvind Neelakantan | David Belanger | Andrew McCallum
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Our goal is to combine the rich multi-step inference of symbolic logical reasoning with the generalization capabilities of neural networks. We are particularly interested in complex reasoning about entities and relations in text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs to compose the distributed semantics of multi-hop paths in KBs; however for multiple reasons, the approach lacks accuracy and practicality. This paper proposes three significant modeling advances: (1) we learn to jointly reason about relations, entities, and entity-types; (2) we use neural attention modeling to incorporate multiple paths; (3) we learn to share strength in a single RNN that represents logical composition across all relations. On a large-scale Freebase+ClueWeb prediction task, we achieve 25% error reduction, and a 53% error reduction on sparse relations due to shared strength. On chains of reasoning in WordNet we reduce error in mean quantile by 84% versus previous state-of-the-art.

pdf bib
Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema
Patrick Verga | Arvind Neelakantan | Andrew McCallum
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Universal schema predicts the types of entities and relations in a knowledge base (KB) by jointly embedding the union of all available schema types—not only types from multiple structured databases (such as Freebase or Wikipedia infoboxes), but also types expressed as textual patterns from raw text. This prediction is typically modeled as a matrix completion problem, with one type per column, and either one or two entities per row (in the case of entity types or binary relation types, respectively). Factorizing this sparsely observed matrix yields a learned vector embedding for each row and each column. In this paper we explore the problem of making predictions for entities or entity-pairs unseen at training time (and hence without a pre-learned row embedding). We propose an approach having no per-row parameters at all; rather we produce a row vector on the fly using a learned aggregation function of the vectors of the observed columns for that row. We experiment with various aggregation functions, including neural network attention models. Our approach can be understood as a natural language database, in that questions about KB entities are answered by attending to textual or database evidence. In experiments predicting both relations and entity types, we demonstrate that despite having an order of magnitude fewer parameters than traditional universal schema, we can match the accuracy of the traditional model, and more importantly, we can now make predictions about unseen rows with nearly the same accuracy as rows available at training time.

2016

pdf bib
Incorporating Selectional Preferences in Multi-hop Relation Extraction
Rajarshi Das | Arvind Neelakantan | David Belanger | Andrew McCallum
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Row-less Universal Schema
Patrick Verga | Andrew McCallum
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks
Teresa Martin | Fiete Botschen | Ajay Nagesh | Andrew McCallum
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Multilingual Relation Extraction using Compositional Universal Schema
Patrick Verga | David Belanger | Emma Strubell | Benjamin Roth | Andrew McCallum
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Learning Dynamic Feature Selection for Fast Sequential Prediction
Emma Strubell | Luke Vilnis | Kate Silverstein | Andrew McCallum
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Compositional Vector Space Models for Knowledge Base Completion
Arvind Neelakantan | Benjamin Roth | Andrew McCallum
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Lexicon Infused Phrase Embeddings for Named Entity Resolution
Alexandre Passos | Vineet Kumar | Andrew McCallum
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf bib
Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space
Arvind Neelakantan | Jeevan Shankar | Alexandre Passos | Andrew McCallum
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Learning Soft Linear Constraints with Application to Citation Field Extraction
Sam Anzaroot | Alexandre Passos | David Belanger | Andrew McCallum
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Relation Extraction with Matrix Factorization and Universal Schemas
Sebastian Riedel | Limin Yao | Andrew McCallum | Benjamin M. Marlin
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Dynamic Knowledge-Base Alignment for Coreference Resolution
Jiaping Zheng | Luke Vilnis | Sameer Singh | Jinho D. Choi | Andrew McCallum
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
Transition-based Dependency Parsing with Selectional Branching
Jinho D. Choi | Andrew McCallum
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Parse, Price and Cut—Delayed Column and Row Generation for Graph Based Parsers
Sebastian Riedel | David Smith | Andrew McCallum
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Monte Carlo MCMC: Efficient Inference by Approximate Sampling
Sameer Singh | Michael Wick | Andrew McCallum
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Human-Machine Cooperation: Supporting User Corrections to Automatically Constructed KBs
Michael Wick | Karl Schultz | Andrew McCallum
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Monte Carlo MCMC: Efficient Inference by Sampling Factors
Sameer Singh | Michael Wick | Andrew McCallum
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
Probabilistic Databases of Universal Schema
Limin Yao | Sebastian Riedel | Andrew McCallum
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX)

pdf bib
A Discriminative Hierarchical Model for Fast Coreference at Large Scale
Michael Wick | Sameer Singh | Andrew McCallum
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Unsupervised Relation Discovery with Sense Disambiguation
Limin Yao | Sebastian Riedel | Andrew McCallum
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Robust Biomedical Event Extraction with Dual Decomposition and Minimal Domain Adaptation
Sebastian Riedel | Andrew McCallum
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Model Combination for Event Extraction in BioNLP 2011
Sebastian Riedel | David McClosky | Mihai Surdeanu | Andrew McCallum | Christopher D. Manning
Proceedings of BioNLP Shared Task 2011 Workshop

pdf bib
Fast and Robust Joint Models for Biomedical Event Extraction
Sebastian Riedel | Andrew McCallum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Optimizing Semantic Coherence in Topic Models
David Mimno | Hanna Wallach | Edmund Talley | Miriam Leenders | Andrew McCallum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Structured Relation Discovery using Generative Models
Limin Yao | Aria Haghighi | Sebastian Riedel | Andrew McCallum
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models
Sameer Singh | Amarnag Subramanya | Fernando Pereira | Andrew McCallum
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Constraint-Driven Rank-Based Learning for Information Extraction
Sameer Singh | Limin Yao | Sebastian Riedel | Andrew McCallum
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Machine Translation Using Overlapping Alignments and SampleRank
Benjamin Roth | Andrew McCallum | Marc Dymetman | Nicola Cancedda
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We present a conditional-random-field approach to discriminatively-trained phrase-based machine translation in which training and decoding are both cast in a sampling framework and are implemented uniformly in a new probabilistic programming language for factor graphs. In traditional phrase-based translation, decoding infers both a "Viterbi" alignment and the target sentence. In contrast, in our approach, a rich overlapping-phrase alignment is produced by a fast deterministic method, while probabilistic decoding infers only the target sentence, which is then able to leverage arbitrary features of the entire source sentence, target sentence and alignment. By using SampleRank for learning we could in principle efficiently estimate hundreds of thousands of parameters. Test-time decoding is done by MCMC sampling with annealing. To demonstrate the potential of our approach we show preliminary experiments leveraging alignments that may contain overlapping bi-phrases.

pdf bib
Collective Cross-Document Relation Extraction Without Labelled Data
Limin Yao | Sebastian Riedel | Andrew McCallum
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria
Gregory Druck | Gideon Mann | Andrew McCallum
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Active Learning by Labeling Features
Gregory Druck | Burr Settles | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment
Kedar Bellare | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Polylingual Topic Models
David Mimno | Hanna M. Wallach | Jason Naradowsky | David A. Smith | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Inference for Natural Language Processing
Andrew McCallum
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

2008

pdf bib
Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields
Gideon S. Mann | Andrew McCallum
Proceedings of ACL-08: HLT

2007

pdf bib
First-Order Probabilistic Models for Coreference Resolution
Aron Culotta | Michael Wick | Andrew McCallum
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields
Gideon Mann | Andrew McCallum
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf bib
Report on the NSF-sponsored Human Language Technology Workshop on Industrial Centers
Mary Harper | Alex Acero | Srinivas Bangalore | Jaime Carbonell | Jordan Cohen | Barbara Cuthill | Carol Espy-Wilson | Christiane Fellbaum | John Garofolo | Chin-Hui Lee | Jim Lester | Andrew McCallum | Nelson Morgan | Michael Picheney | Joe Picone | Lance Ramshaw | Jeff Reynar | Hadar Shemtov | Clare Voss
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
Learning Field Compatibilities to Extract Database Records from Unstructured Text
Michael Wick | Aron Culotta | Andrew McCallum
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Ryan McDonald | Charles Sutton | Hal Daumé III | Andrew McCallum | Fernando Pereira | Jeff Bilmes
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

pdf bib
Practical Markov Logic Containing First-Order Quantifiers with Application to Identity Uncertainty
Aron Culotta | Andrew McCallum
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

pdf bib
Reducing Weight Undertraining in Structured Discriminative Learning
Charles Sutton | Michael Sindelar | Andrew McCallum
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text
Aron Culotta | Andrew McCallum | Jonathan Betz
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Composition of Conditional Random Fields for Transfer Learning
Charles Sutton | Andrew McCallum
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Parsing and Semantic Role Labeling
Charles Sutton | Andrew McCallum
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
Accurate Information Extraction from Research Papers using Conditional Random Fields
Fuchun Peng | Andrew McCallum
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
Confidence Estimation for Information Extraction
Aron Culotta | Andrew McCallum
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
Chinese Segmentation and New Word Detection using Conditional Random Fields
Fuchun Peng | Fangfang Feng | Andrew McCallum
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons
Andrew McCallum | Wei Li
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

1999

pdf bib
Text Classification by Bootstrapping with Keywords, EM and Shrinkage
Andrew McCallum | Kamal Nigam
Unsupervised Learning in Natural Language Processing

Search
Co-authors