Jay Pujara


2022

pdf bib
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Pei Zhou | Karthik Gopalakrishnan | Behnam Hedayatnia | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit knowledge, such as common sense, is key to fluid human conversations. Current neural response generation (RG) models are trained to generate responses directly, omitting unstated implicit knowledge. In this paper, we present Think-Before-Speaking (TBS), a generative approach to first externalize implicit commonsense knowledge (think) and use this knowledge to generate responses (speak). We argue that externalizing implicit knowledge allows more efficient learning, produces more informative responses, and enables more explainable models. We analyze different choices to collect knowledge-aligned dialogues, represent implicit knowledge, and transition between knowledge and dialogues. Empirical results show TBS models outperform end-to-end and knowledge-augmented RG baselines on most automatic metrics and generate more informative, specific, and commonsense-following responses, as evaluated by human annotators. TBS also generates knowledge that makes sense and is relevant to the dialogue around 85% of the time

pdf bib
Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
Dong-Ho Lee | Akshen Kadakia | Kangmin Tan | Mahak Agarwal | Xinyu Feng | Takashi Shibuya | Ryosuke Mitani | Toshiyuki Sekiya | Jay Pujara | Xiang Ren
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in prompt-based learning have shown strong results on few-shot text classification by using cloze-style templates.Similar attempts have been made on named entity recognition (NER) which manually design templates to predict entity types for every text span in a sentence. However, such methods may suffer from error propagation induced by entity span detection, high cost due to enumeration of all possible text spans, and omission of inter-dependencies among token labels in a sentence. Here we present a simple demonstration-based learning method for NER, which lets the input be prefaced by task demonstrations for in-context learning. We perform a systematic study on demonstration strategy regarding what to include (entity examples, with or without surrounding context), how to select the examples, and what templates to use. Results on in-domain learning and domain adaptation show that the model’s performance in low-resource settings can be largely improved with a suitable demonstration strategy (e.g., a 4-17% improvement on 25 train instances). We also find that good demonstration can save many labeled examples and consistency in demonstration contributes to better performance.

pdf bib
Leveraging Visual Knowledge in Language Tasks: An Empirical Study on Intermediate Pre-training for Cross-Modal Knowledge Transfer
Woojeong Jin | Dong-Ho Lee | Chenguang Zhu | Jay Pujara | Xiang Ren
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Pre-trained language models are still far from human performance in tasks that need understanding of properties (e.g. appearance, measurable quantity) and affordances of everyday objects in the real world since the text lacks such information due to reporting bias.In this work, we study whether integrating visual knowledge into a language model can fill the gap.We investigate two types of knowledge transfer: (1) text knowledge transfer using image captions that may contain enriched visual knowledge and (2) cross-modal knowledge transfer using both images and captions with vision-language training objectives.On 5 downstream tasks that may need visual knowledge to solve the problem, we perform extensive empirical comparisons over the presented objectives.Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.

2021

pdf bib
Finding Pragmatic Differences Between Disciplines
Lee Kezar | Jay Pujara
Proceedings of the Second Workshop on Scholarly Document Processing

Scholarly documents have a great degree of variation, both in terms of content (semantics) and structure (pragmatics). Prior work in scholarly document understanding emphasizes semantics through document summarization and corpus topic modeling but tends to omit pragmatics such as document organization and flow. Using a corpus of scholarly documents across 19 disciplines and state-of-the-art language modeling techniques, we learn a fixed set of domain-agnostic descriptors for document sections and “retrofit” the corpus to these descriptors (also referred to as “normalization”). Then, we analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. We report within-discipline structural archetypes, variability, and between-discipline comparisons, supporting the hypothesis that scholarly communities, despite their size, diversity, and breadth, share similar avenues for expressing their work. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.

pdf bib
Think Before You Speak: Learning to Generate Implicit Knowledge for Response Generation by Self-Talk
Pei Zhou | Behnam Hedayatnia | Karthik Gopalakrishnan | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Humans make appropriate responses not only based on previous dialogue utterances but also on implicit background knowledge such as common sense. Although neural response generation models seem to produce human-like responses, they are mostly end-to-end and not generating intermediate grounds between a dialogue history and responses. This work aims to study if and how we can train an RG model that talks with itself to generate implicit knowledge before making responses. We further investigate can such models identify when to generate implicit background knowledge and when it is not necessary. Experimental results show that compared with models that directly generate responses given a dialogue history, self-talk models produce better-quality responses according to human evaluation on grammaticality, coherence, and engagingness. And models that are trained to identify when to self-talk further improves the response quality. Analysis on generated implicit knowledge shows that models mostly use the knowledge appropriately in the responses.

pdf bib
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
Pei Zhou | Karthik Gopalakrishnan | Behnam Hedayatnia | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Smooth and effective communication requires the ability to perform latent or explicit commonsense inference. Prior commonsense reasoning benchmarks (such as SocialIQA and CommonsenseQA) mainly focus on the discriminative task of choosing the right answer from a set of candidates, and do not involve interactive language generation as in dialogue. Moreover, existing dialogue datasets do not explicitly focus on exhibiting commonsense as a facet. In this paper, we present an empirical study of commonsense in dialogue response generation. We first auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet, a commonsense knowledge graph. Furthermore, building on social contexts/situations in SocialIQA, we collect a new dialogue dataset with 25K dialogues aimed at exhibiting social commonsense in an interactive setting. We evaluate response generation models trained using these datasets and find that models trained on both extracted and our collected data produce responses that consistently exhibit more commonsense than baselines. Finally we propose an approach for automatic evaluation of commonsense that relies on features derived from ConceptNet and pre-trained language and dialog models, and show reasonable correlation with human evaluation of responses’ commonsense quality.

pdf bib
Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources
Ninareh Mehrabi | Pei Zhou | Fred Morstatter | Jay Pujara | Xiang Ren | Aram Galstyan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Warning: this paper contains content that may be offensive or upsetting. Commonsense knowledge bases (CSKB) are increasingly used for various natural language processing tasks. Since CSKBs are mostly human-generated and may reflect societal biases, it is important to ensure that such biases are not conflated with the notion of commonsense. Here we focus on two widely used CSKBs, ConceptNet and GenericsKB, and establish the presence of bias in the form of two types of representational harms, overgeneralization of polarized perceptions and representation disparity across different demographic groups in both CSKBs. Next, we find similar representational harms for downstream models that use ConceptNet. Finally, we propose a filtering-based approach for mitigating such harms, and observe that our filtered-based approach can reduce the issues in both resources and models but leads to a performance drop, leaving room for future work to build fairer and stronger commonsense models.

pdf bib
Numeracy enhances the Literacy of Language Models
Avijit Thawani | Jay Pujara | Filip Ilievski
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Specialized number representations in NLP have shown improvements on numerical reasoning tasks like arithmetic word problems and masked number prediction. But humans also use numeracy to make better sense of world concepts, e.g., you can seat 5 people in your ‘room’ but not 500. Does a better grasp of numbers improve a model’s understanding of other concepts and words? This paper studies the effect of using six different number encoders on the task of masked word prediction (MWP), as a proxy for evaluating literacy. To support this investigation, we develop Wiki-Convert, a 900,000 sentence dataset annotated with numbers and units, to avoid conflating nominal and ordinal number occurrences. We find a significant improvement in MWP for sentences containing numbers, that exponent embeddings are the best number encoders, yielding over 2 points jump in prediction accuracy over a BERT baseline, and that these enhanced literacy skills also generalize to contexts without annotated numbers. We release all code at https://git.io/JuZXn.

pdf bib
RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms
Pei Zhou | Rahul Khanna | Seyeon Lee | Bill Yuchen Lin | Daniel Ho | Jay Pujara | Xiang Ren
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Pre-trained language models (PTLMs) have achieved impressive performance on commonsense inference benchmarks, but their ability to employ commonsense to make robust inferences, which is crucial for effective communications with humans, is debated. In the pursuit of advancing fluid human-AI communication, we propose a new challenge, RICA: Robust Inference using Commonsense Axioms, that evaluates robust commonsense inference despite textual perturbations. To generate data for this challenge, we develop a systematic and scalable procedure using commonsense knowledge bases and probe PTLMs across two different evaluation settings. Extensive experiments on our generated probe sets with more than 10k statements show that PTLMs perform no better than random guessing on the zero-shot setting, are heavily impacted by statistical biases, and are not robust to perturbation attacks. We also find that fine-tuning on similar statements offer limited gains, as PTLMs still fail to generalize to unseen inferences. Our new large-scale benchmark exposes a significant gap between PTLMs and human-level language understanding and offers a new challenge for PTLMs to demonstrate commonsense.

pdf bib
Table-based Fact Verification With Salience-aware Learning
Fei Wang | Kexuan Sun | Jay Pujara | Pedro Szekely | Muhao Chen
Findings of the Association for Computational Linguistics: EMNLP 2021

Tables provide valuable knowledge that can be used to verify textual statements. While a number of works have considered table-based fact verification, direct alignments of tabular data with tokens in textual statements are rarely available. Moreover, training a generalized fact verification model requires abundant labeled training data. In this paper, we propose a novel system to address these problems. Inspired by counterfactual causality, our system identifies token-level salience in the statement with probing-based salience estimation. Salience estimation allows enhanced learning of fact verification from two perspectives. From one perspective, our system conducts masked salient token prediction to enhance the model for alignment and reasoning between the table and the statement. From the other perspective, our system applies salience-aware data augmentation to generate a more diverse set of training instances by replacing non-salient terms. Experimental results on TabFact show the effective improvement by the proposed salience-aware learning techniques, leading to the new SOTA performance on the benchmark.

pdf bib
Probing Commonsense Explanation in Dialogue Response Generation
Pei Zhou | Pegah Jandaghi | Hyundong Cho | Bill Yuchen Lin | Jay Pujara | Xiang Ren
Findings of the Association for Computational Linguistics: EMNLP 2021

Humans use commonsense reasoning (CSR) implicitly to produce natural and coherent responses in conversations. Aiming to close the gap between current response generation (RG) models and human communication abilities, we want to understand why RG models respond as they do by probing RG model’s understanding of commonsense reasoning that elicits proper responses. We formalize the problem by framing commonsense as a latent variable in the RG task and using explanations for responses as textual form of commonsense. We collect 6k annotated explanations justifying responses from four dialogue datasets and ask humans to verify them and propose two probing settings to evaluate RG models’ CSR capabilities. Probing results show that models fail to capture the logical relations between commonsense explanations and responses and fine-tuning on in-domain data and increasing model sizes do not lead to understanding of CSR for RG. We hope our study motivates more research in making RG models emulate the human reasoning process in pursuit of smooth human-AI communication.

pdf bib
Representing Numbers in NLP: a Survey and a Vision
Avijit Thawani | Jay Pujara | Filip Ilievski | Pedro Szekely
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational choices made by over a dozen previously published number encoders and decoders. We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.

2017

pdf bib
Sparsity and Noise: Where Knowledge Graph Embeddings Fall Short
Jay Pujara | Eriq Augustine | Lise Getoor
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Knowledge graph (KG) embedding techniques use structured relationships between entities to learn low-dimensional representations of entities and relations. One prominent goal of these approaches is to improve the quality of knowledge graphs by removing errors and adding missing facts. Surprisingly, most embedding techniques have been evaluated on benchmark datasets consisting of dense and reliable subsets of human-curated KGs, which tend to be fairly complete and have few errors. In this paper, we consider the problem of applying embedding techniques to KGs extracted from text, which are often incomplete and contain errors. We compare the sparsity and unreliability of different KGs and perform empirical experiments demonstrating how embedding approaches degrade as sparsity and unreliability increase.

2016

pdf bib
Proceedings of the 5th Workshop on Automated Knowledge Base Construction
Jay Pujara | Tim Rocktaschel | Danqi Chen | Sameer Singh
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

2015

pdf bib
RELLY: Inferring Hypernym Relationships Between Relational Phrases
Adam Grycner | Gerhard Weikum | Jay Pujara | James Foulds | Lise Getoor
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing