Tom Mitchell

Also published as: Tom M. Mitchell


2021

pdf bib
Conversational Multi-Hop Reasoning with Neural Commonsense Knowledge and Symbolic Logic Rules
Forough Arabshahi | Jennifer Lee | Antoine Bosselut | Yejin Choi | Tom Mitchell
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

One of the challenges faced by conversational agents is their inability to identify unstated presumptions of their users’ commands, a task trivial for humans due to their common sense. In this paper, we propose a zero-shot commonsense reasoning system for conversational agents in an attempt to achieve this. Our reasoner uncovers unstated presumptions from user commands satisfying a general template of if-(state), then-(action), because-(goal). Our reasoner uses a state-of-the-art transformer-based generative commonsense knowledge base (KB) as its source of background knowledge for reasoning. We propose a novel and iterative knowledge query mechanism to extract multi-hop reasoning chains from the neural KB which uses symbolic logic rules to significantly reduce the search space. Similar to any KBs gathered to date, our commonsense KB is prone to missing knowledge. Therefore, we propose to conversationally elicit the missing knowledge from human users with our novel dynamic question generation strategy, which generates and presents contextualized queries to human users. We evaluate the model with a user study with human users that achieves a 35% higher success rate compared to SOTA.

2020

pdf bib
Interactive Task Learning from GUI-Grounded Natural Language Instructions and Demonstrations
Toby Jia-Jun Li | Tom Mitchell | Brad Myers
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We show SUGILITE, an intelligent task automation agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations, using the graphical user interfaces (GUIs) of third-party mobile apps. This system provides several interesting features: (1) it allows users to teach new task procedures and concepts through verbal instructions together with demonstration of the steps of a script using GUIs; (2) it supports users in clarifying their intents for demonstrated actions using GUI-grounded verbal instructions; (3) it infers parameters of tasks and their possible values in utterances using the hierarchical structures of the underlying app GUIs; and (4) it generalizes taught concepts to different contexts and task domains. We describe the architecture of the SUGILITE system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android.

2019

pdf bib
Relating Simple Sentence Representations in Deep Neural Networks and the Brain
Sharmistha Jat | Hao Tang | Partha Talukdar | Tom Mitchell
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

What is the relationship between sentence representations learned by deep recurrent models against those encoded by the brain? Is there any correspondence between hidden layers of these recurrent models and brain regions when processing sentences? Can these deep models be used to synthesize brain data which can then be utilized in other extrinsic tasks? We investigate these questions using sentences with simple syntax and semantics (e.g., The bone was eaten by the dog.). We consider multiple neural network architectures, including recently proposed ELMo and BERT. We use magnetoencephalography (MEG) brain recording data collected from human subjects when they were reading these simple sentences. Overall, we find that BERT’s activations correlate the best with MEG brain data. We also find that the deep network representation can be used to generate brain data from new sentences to augment existing brain data. To the best of our knowledge, this is the first work showing that the MEG brain recording when reading a word in a sentence can be used to distinguish earlier words in the sentence. Our exploration is also the first to use deep neural network representations to generate synthetic brain data and to show that it helps in improving subsequent stimuli decoding task accuracy.

pdf bib
Look-up and Adapt: A One-shot Semantic Parser
Zhichu Lu | Forough Arabshahi | Igor Labutov | Tom Mitchell
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Computing devices have recently become capable of interacting with their end users via natural language. However, they can only operate within a limited “supported” domain of discourse and fail drastically when faced with an out-of-domain utterance, mainly due to the limitations of their semantic parser. In this paper, we propose a semantic parser that generalizes to out-of-domain examples by learning a general strategy for parsing an unseen utterance through adapting the logical forms of seen utterances, instead of learning to generate a logical form from scratch. Our parser maintains a memory consisting of a representative subset of the seen utterances paired with their logical forms. Given an unseen utterance, our parser works by looking up a similar utterance from the memory and adapting its logical form until it fits the unseen utterance. Moreover, we present a data generation strategy for constructing utterance-logical form pairs from different domains. Our results show an improvement of up to 68.8% on one-shot parsing under two different evaluation settings compared to the baselines.

pdf bib
Learning to Ask for Conversational Machine Learning
Shashank Srivastava | Igor Labutov | Tom Mitchell
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Natural language has recently been explored as a new medium of supervision for training machine learning models. Here, we explore learning classification tasks using language in a conversational setting – where the automated learner does not simply receive language input from a teacher, but can proactively engage the teacher by asking questions. We present a reinforcement learning framework, where the learner’s actions correspond to question types and the reward for asking a question is based on how the teacher’s response changes performance of the resulting machine learning model on the learning task. In this framework, learning good question-asking strategies corresponds to asking sequences of questions that maximize the cumulative (discounted) reward, and hence quickly lead to effective classifiers. Empirical analysis across three domains shows that learned question-asking strategies expedite classifier training by asking appropriate questions at different points in the learning process. The approach allows learning classifiers from a blend of strategies, including learning from observations, explanations and clarifications.

pdf bib
Understanding language-elicited EEG data by predicting it from a fine-tuned language model
Dan Schwartz | Tom Mitchell
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Electroencephalography (EEG) recordings of brain activity taken while participants read or listen to language are widely used within the cognitive neuroscience and psycholinguistics communities as a tool to study language comprehension. Several time-locked stereotyped EEG responses to word-presentations – known collectively as event-related potentials (ERPs) – are thought to be markers for semantic or syntactic processes that take place during comprehension. However, the characterization of each individual ERP in terms of what features of a stream of language trigger the response remains controversial. Improving this characterization would make ERPs a more useful tool for studying language comprehension. We take a step towards better understanding the ERPs by finetuning a language model to predict them. This new approach to analysis shows for the first time that all of the ERPs are predictable from embeddings of a stream of language. Prior work has only found two of the ERPs to be predictable. In addition to this analysis, we examine which ERPs benefit from sharing parameters during joint training. We find that two pairs of ERPs previously identified in the literature as being related to each other benefit from joint training, while several other pairs of ERPs that benefit from joint training are suggestive of potential relationships. Extensions of this analysis that further examine what kinds of information in the model embeddings relate to each ERP have the potential to elucidate the processes involved in human language comprehension.

pdf bib
Competence-based Curriculum Learning for Neural Machine Translation
Emmanouil Antonios Platanios | Otilia Stretcu | Graham Neubig | Barnabas Poczos | Tom Mitchell
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Current state-of-the-art NMT systems use large neural networks that are not only slow to train, but also often require many heuristics and optimization tricks, such as specialized learning rate schedules and large batch sizes. This is undesirable as it requires extensive hyperparameter tuning. In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. Our framework consists of a principled way of deciding which training samples are shown to the model at different times during training, based on the estimated difficulty of a sample and the current competence of the model. Filtering training samples in this manner prevents the model from getting stuck in bad local optima, making it converge faster and reach a better solution than the common approach of uniformly sampling training examples. Furthermore, the proposed method can be easily applied to existing NMT models by simply modifying their input data pipelines. We show that our framework can help improve the training time and the performance of both recurrent neural network models and Transformers, achieving up to a 70% decrease in training time, while at the same time obtaining accuracy improvements of up to 2.2 BLEU.

pdf bib
Discourse in Multimedia: A Case Study in Extracting Geometry Knowledge from Textbooks
Mrinmaya Sachan | Avinava Dubey | Eduard H. Hovy | Tom M. Mitchell | Dan Roth | Eric P. Xing
Computational Linguistics, Volume 45, Issue 4 - December 2019

To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features that can be leveraged for various NLP tasks. In this article, we study some of these discourse features in multimedia text and what communicative function they fulfill in the context. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We conclude that the discourse and text layout features provide information that is complementary to lexical semantic information. Finally, we show that the harvested structured knowledge can be used to improve an existing solver for geometry problems, making it more accurate as well as more explainable.

2018

pdf bib
Zero-shot Learning of Classifiers from Natural Language Quantification
Shashank Srivastava | Igor Labutov | Tom Mitchell
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Humans can efficiently learn new concepts using language. We present a framework through which a set of explanations of a concept can be used to learn a classifier without access to any labeled examples. We use semantic parsing to map explanations to probabilistic assertions grounded in latent class labels and observed attributes of unlabeled data, and leverage the differential semantics of linguistic quantifiers (e.g., ‘usually’ vs ‘always’) to drive model training. Experiments on three domains show that the learned classifiers outperform previous approaches for learning with limited data, and are comparable with fully supervised classifiers trained from a small number of labeled examples.

pdf bib
Contextual Parameter Generation for Universal Neural Machine Translation
Emmanouil Antonios Platanios | Mrinmaya Sachan | Graham Neubig | Tom Mitchell
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a simple modification to existing neural machine translation (NMT) models that enables using a single universal model to translate between multiple languages while allowing for language specific parameterization, and that can also be used for domain adaptation. Our approach requires no changes to the model architecture of a standard NMT system, but instead introduces a new component, the contextual parameter generator (CPG), that generates the parameters of the system (e.g., weights in a neural network). This parameter generator accepts source and target language embeddings as input, and generates the parameters for the encoder and the decoder, respectively. The rest of the model remains unchanged and is shared across all languages. We show how this simple modification enables the system to use monolingual data for training and also perform zero-shot translation. We further show it is able to surpass state-of-the-art performance for both the IWSLT-15 and IWSLT-17 datasets and that the learned language embeddings are able to uncover interesting relationships between languages.

pdf bib
Learning to Learn Semantic Parsers from Natural Language Supervision
Igor Labutov | Bishan Yang | Tom Mitchell
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

As humans, we often rely on language to learn language. For example, when corrected in a conversation, we may learn from that correction, over time improving our language fluency. Inspired by this observation, we propose a learning algorithm for training semantic parsers from supervision (feedback) expressed in natural language. Our algorithm learns a semantic parser from users’ corrections such as “no, what I really meant was before his job, not after”, by also simultaneously learning to parse this natural language feedback in order to leverage it as a form of supervision. Unlike supervision with gold-standard logical forms, our method does not require the user to be familiar with the underlying logical formalism, and unlike supervision from denotation, it does not require the user to know the correct answer to their query. This makes our learning algorithm naturally scalable in settings where existing conversational logs are available and can be leveraged as training data. We construct a novel dataset of natural language feedback in a conversational setting, and show that our method is effective at learning a semantic parser from such natural language supervision.

pdf bib
LIA: A Natural Language Programmable Personal Assistant
Igor Labutov | Shashank Srivastava | Tom Mitchell
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present LIA, an intelligent personal assistant that can be programmed using natural language. Our system demonstrates multiple competencies towards learning from human-like interactions. These include the ability to be taught reusable conditional procedures, the ability to be taught new knowledge about the world (concepts in an ontology) and the ability to be taught how to ground that knowledge in a set of sensors and effectors. Building such a system highlights design questions regarding the overall architecture that such an agent should have, as well as questions about parsing and grounding language in situational contexts. We outline key properties of this architecture, and demonstrate a prototype that embodies them in the form of a personal assistant on an Android device.

2017

pdf bib
A Joint Sequential and Relational Model for Frame-Semantic Parsing
Bishan Yang | Tom Mitchell
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce a new method for frame-semantic parsing that significantly improves the prior state of the art. Our model leverages the advantages of a deep bidirectional LSTM network which predicts semantic role labels word by word and a relational network which predicts semantic roles for individual text expressions in relation to a predicate. The two networks are integrated into a single model via knowledge distillation, and a unified graphical model is employed to jointly decode frames and semantic roles during inference. Experiments on the standard FrameNet data show that our model significantly outperforms existing neural and non-neural approaches, achieving a 5.7 F1 gain over the current state of the art, for full frame structure extraction.

pdf bib
Joint Concept Learning and Semantic Parsing from Natural Language Explanations
Shashank Srivastava | Igor Labutov | Tom Mitchell
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Natural language constitutes a predominant medium for much of human learning and pedagogy. We consider the problem of concept learning from natural language explanations, and a small number of labeled examples of the concept. For example, in learning the concept of a phishing email, one might say ‘this is a phishing email because it asks for your bank account number’. Solving this problem involves both learning to interpret open ended natural language statements, and learning the concept itself. We present a joint model for (1) language interpretation (semantic parsing) and (2) concept learning (classification) that does not require labeling statements with logical forms. Instead, the model prefers discriminative interpretations of statements in context of observable features of the data as a weak signal for parsing. On a dataset of email-related concepts, our approach yields across-the-board improvements in classification performance, with a 30% relative improvement in F1 score over competitive methods in the low data regime.

pdf bib
Leveraging Knowledge Bases in LSTMs for Improving Machine Reading
Bishan Yang | Tom Mitchell
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper focuses on how to take advantage of external knowledge bases (KBs) to improve recurrent neural networks for machine reading. Traditional methods that exploit knowledge from KBs encode knowledge as discrete indicator features. Not only do these features generalize poorly, but they require task-specific feature engineering to achieve good performance. We propose KBLSTM, a novel neural model that leverages continuous representations of KBs to enhance the learning of recurrent neural networks for machine reading. To effectively integrate background knowledge with information from the currently processed text, our model employs an attention mechanism with a sentinel to adaptively decide whether to attend to background knowledge and which information from KBs is useful. Experimental results show that our model achieves accuracies that surpass the previous state-of-the-art results for both entity extraction and event extraction on the widely used ACE2005 dataset.

pdf bib
A Probabilistic Generative Grammar for Semantic Parsing
Abulhair Saparov | Vijay Saraswat | Tom Mitchell
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

We present a generative model of natural language sentences and demonstrate its application to semantic parsing. In the generative process, a logical form sampled from a prior, and conditioned on this logical form, a grammar probabilistically generates the output sentence. Grammar induction using MCMC is applied to learn the grammar given a set of labeled sentences with corresponding logical forms. We develop a semantic parser that finds the logical form with the highest posterior probability exactly. We obtain strong results on the GeoQuery dataset and achieve state-of-the-art F1 on Jobs.

pdf bib
Merging knowledge bases in different languages
Jerónimo Hernández-González | Estevam R. Hruschka Jr. | Tom M. Mitchell
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing

Recently, different systems which learn to populate and extend a knowledge base (KB) from the web in different languages have been presented. Although a large set of concepts should be learnt independently from the language used to read, there are facts which are expected to be more easily gathered in local language (e.g., culture or geography). A system that merges KBs learnt in different languages will benefit from the complementary information as long as common beliefs are identified, as well as from redundancy present in web pages written in different languages. In this paper, we deal with the problem of identifying equivalent beliefs (or concepts) across language specific KBs, assuming that they share the same ontology of categories and relations. In a case study with two KBs independently learnt from different inputs, namely web pages written in English and web pages written in Portuguese respectively, we report on the results of two methodologies: an approach based on personalized PageRank and an inference technique to find out common relevant paths through the KBs. The proposed inference technique efficiently identifies relevant paths, outperforming the baseline (a dictionary-based classifier) in the vast majority of tested categories.

2016

pdf bib
Joint Extraction of Events and Entities within a Document Context
Bishan Yang | Tom M. Mitchell
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Mapping Verbs in Different Languages to Knowledge Base Relations using Web Text as Interlingua
Derry Tanti Wijaya | Tom M. Mitchell
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary
Jayant Krishnamurthy | Tom M. Mitchell
Transactions of the Association for Computational Linguistics, Volume 3

We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Crucially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as “Republican front-runner from Texas” whose semantics cannot be represented using the Freebase schema. Our approach directly converts a sentence’s syntactic CCG parse into a logical form containing predicates derived from the words in the sentence, assigning each word a consistent semantics across sentences. This logical form is evaluated against a learned probabilistic database that defines a distribution over denotations for each textual predicate. A training phase produces this probabilistic database using a corpus of entity-linked text and probabilistic matrix factorization with a novel ranking objective function. We evaluate our approach on a compositional question answering task where it outperforms several competitive baselines. We also compare our approach against manually annotated Freebase queries, finding that our open predicate vocabulary enables us to answer many questions that Freebase cannot.

pdf bib
A Knowledge-Intensive Model for Prepositional Phrase Attachment
Ndapandula Nakashole | Tom M. Mitchell
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
“A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce”: Learning State Changing Verbs from Wikipedia Revision History
Derry Tanti Wijaya | Ndapandula Nakashole | Tom Mitchell
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Translation Invariant Word Embeddings
Kejun Huang | Matt Gardner | Evangelos Papalexakis | Christos Faloutsos | Nikos Sidiropoulos | Tom Mitchell | Partha P. Talukdar | Xiao Fu
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction
Matt Gardner | Tom Mitchell
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Compositional and Interpretable Semantic Space
Alona Fyshe | Leila Wehbe | Partha P. Talukdar | Brian Murphy | Tom M. Mitchell
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Aligning context-based statistical models of language with brain activity during reading
Leila Wehbe | Ashish Vaswani | Kevin Knight | Tom Mitchell
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases
Matt Gardner | Partha Talukdar | Jayant Krishnamurthy | Tom Mitchell
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
CTPs: Contextual Temporal Profiles for Time Scoping Facts using State Change Detection
Derry Tanti Wijaya | Ndapandula Nakashole | Tom M. Mitchell
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Interpretable Semantic Vectors from a Joint Model of Brain- and Text- Based Meaning
Alona Fyshe | Partha P. Talukdar | Brian Murphy | Tom M. Mitchell
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Language-Aware Truth Assessment of Fact Candidates
Ndapandula Nakashole | Tom M. Mitchell
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Joint Syntactic and Semantic Parsing with Combinatory Categorial Grammar
Jayant Krishnamurthy | Tom M. Mitchell
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Vector Space Semantic Parsing: A Framework for Compositional Vector Space Models
Jayant Krishnamurthy | Tom Mitchell
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality

pdf bib
Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition
Alona Fyshe | Brian Murphy | Partha Talukdar | Tom Mitchell
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues
Matt Gardner | Partha Pratim Talukdar | Bryan Kisiel | Tom Mitchell
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding
Brian Murphy | Partha Talukdar | Tom Mitchell
Proceedings of COLING 2012

pdf bib
Selecting Corpus-Semantic Models for Neurolinguistic Decoding
Brian Murphy | Partha Talukdar | Tom Mitchell
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Weakly Supervised Training of Semantic Parsers
Jayant Krishnamurthy | Tom Mitchell
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Random Walk Inference and Learning in A Large Scale Knowledge Base
Ni Lao | Tom Mitchell | William W. Cohen
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Discovering Relations between Noun Categories
Thahir Mohamed | Estevam Hruschka | Tom Mitchell
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Which Noun Phrases Denote Which Concepts?
Jayant Krishnamurthy | Tom Mitchell
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf bib
Quantitative modeling of the neural representation of adjective-noun phrases to account for fMRI activation
Kai-min K. Chang | Vladimir L. Cherkassky | Tom M. Mitchell | Marcel Adam Just
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Coupling Semi-Supervised Learning of Categories and Relations
Andrew Carlson | Justin Betteridge | Estevam Rafael Hruschka Junior | Tom M. Mitchell
Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing

2004

pdf bib
Learning to Classify Email into “Speech Acts”
William W. Cohen | Vitor R. Carvalho | Tom M. Mitchell
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing