Chitta Baral.

Also published as: Chitta Baral


pdf bib
‘Just because you are right, doesn’t mean I am wrong’: Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks
Man Luo | Shailaja Keyur Sampat | Riley Tallman | Yankai Zeng | Manuha Vancha | Akarshan Sajja | Chitta Baral
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

GQA (CITATION) is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best vision-language models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools. We introduce a semantic metric based on AAS and modify top VQA solvers to support multiple plausible answers for a question. We implement this approach on the GQA dataset and show the performance improvements.

pdf bib
Self-Supervised Test-Time Learning for Reading Comprehension
Pratyay Banerjee | Tejas Gokhale | Chitta Baral
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods. In this work, we consider the task of unsupervised reading comprehension and present a method that performs “test-time learning” (TTL) on a given context (text passage), without requiring training on large-scale human-authored datasets containing context-question-answer triplets. This method operates directly on a single test context, uses self-supervision to train models on synthetically generated question-answer pairs, and then infers answers to unseen human-authored questions for this context. Our method achieves accuracies competitive with fully supervised methods and significantly outperforms current unsupervised methods. TTL methods with a smaller model are also competitive with the current state-of-the-art in unsupervised reading comprehension.

pdf bib
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
Shailaja Keyur Sampat | Akshay Kumar | Yezhou Yang | Chitta Baral
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video. In this paper, we take visual understanding to a higher level where systems are challenged to answer questions that involve mentally simulating the hypothetical consequences of performing specific actions in a given scenario. Towards that end, we formulate a vision-language question answering task based on the CLEVR (Johnson et. al., 2017) dataset. We then modify the best existing VQA methods and propose baseline solvers for this task. Finally, we motivate the development of better vision-language models by providing insights about the capability of diverse architectures to perform joint reasoning over image-text modality. Our dataset setup scripts and codes will be made publicly available at

pdf bib
Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction
Ming Shen | Pratyay Banerjee | Chitta Baral
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this work, we propose Masked Noun-Phrase Prediction (MNPP), a pre-training strategy to tackle pronoun resolution in a fully unsupervised setting. Firstly, We evaluate our pre-trained model on various pronoun resolution datasets without any finetuning. Our method outperforms all previous unsupervised methods on all datasets by large margins. Secondly, we proceed to a few-shot setting where we finetune our pre-trained model on WinoGrande-S and XS separately. Our method outperforms RoBERTa-large baseline with large margins, meanwhile, achieving a higher AUC score after further finetuning on the remaining three official splits of WinoGrande.

pdf bib
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee | Tejas Gokhale | Yezhou Yang | Chitta Baral
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Constructing Flow Graphs from Procedural Cybersecurity Texts
Kuntal Kumar Pal | Kazuaki Kashihara | Pratyay Banerjee | Swaroop Mishra | Ruoyu Wang | Chitta Baral
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021


pdf bib
Deeply Embedded Knowledge Representation & Reasoning For Natural Language Question Answering: A Practitioner’s Perspective
Arindam Mitra | Sanjay Narayana | Chitta Baral
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Successful application of Knowledge Representation and Reasoning (KR) in Natural Language Understanding (NLU) is largely limited by the availability of a robust and general purpose natural language parser. Even though several projects have been launched in the pursuit of developing a universal meaning representation language, the existence of an accurate universal parser is far from reality. This has severely limited the application of knowledge representation and reasoning (KR) in the field of NLP and also prevented a proper evaluation of KR based NLU systems. Our goal is to build KR based systems for Natural Language Understanding without relying on a parser. Towards this we propose a method named Deeply Embedded Knowledge Representation & Reasoning (DeepEKR) where we replace the parser by a neural network, soften the symbolic representation so that a deterministic mapping exists between the parser neural network and the interpretable logical form, and finally replace the symbolic solver by an equivalent neural network, so the model can be trained end-to-end. We evaluate our method with respect to the task of Qualitative Word Problem Solving on the two available datasets (QuaRTz and QuaRel). Our system achieves same accuracy as that of the state-of-the-art accuracy on QuaRTz, outperforms the state-of-the-art on QuaRel and severely outperforms a traditional KR based system. The results show that the bias introduced by a KR solution does not prevent it from doing a better job at the end task. Moreover, our method is interpretable due to the bias introduced by the KR approach.

pdf bib
Visuo-Linguistic Question Answering (VLQA) Challenge
Shailaja Keyur Sampat | Yezhou Yang | Chitta Baral
Findings of the Association for Computational Linguistics: EMNLP 2020

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however joint reasoning is still a challenge for state-of-the-art computer vision and natural language processing (NLP) systems. We propose a novel task to derive joint inference about a given image-text modality and compile the Visuo-Linguistic Question Answering (VLQA) challenge corpus in a question answering setting. Each dataset item consists of an image and a reading passage, where questions are designed to combine both visual and textual information i.e., ignoring either modality would make the question unanswerable. We first explore the best existing vision-language architectures to solve VLQA subsets and show that they are unable to reason well. We then develop a modular method with slightly better baseline performance, but it is still far behind human performance. We believe that VLQA will be a good benchmark for reasoning over a visuo-linguistic context. The dataset, code and leaderboard is available at

pdf bib
Self-Supervised Knowledge Triplet Learning for Zero-Shot Question Answering
Pratyay Banerjee | Chitta Baral
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The aim of all Question Answering (QA) systems is to generalize to unseen questions. Current supervised methods are reliant on expensive data annotation. Moreover, such annotations can introduce unintended annotator bias, making systems focus more on the bias than the actual task. This work proposes Knowledge Triplet Learning (KTL), a self-supervised task over knowledge graphs. We propose heuristics to create synthetic graphs for commonsense and scientific knowledge. We propose using KTL to perform zero-shot question answering, and our experiments show considerable improvements over large pre-trained transformer language models.

pdf bib
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang | Tejas Gokhale | Pratyay Banerjee | Chitta Baral | Yezhou Yang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Captioning is a crucial and challenging task for video understanding. In videos that involve active agents such as humans, the agent’s actions can bring about myriad changes in the scene. Observable changes such as movements, manipulations, and transformations of the objects in the scene, are reflected in conventional video captioning. Unlike images, actions in videos are also inherently linked to social aspects such as intentions (why the action is taking place), effects (what changes due to the action), and attributes that describe the agent. Thus for video understanding, such as when captioning videos or when answering questions about videos, one must have an understanding of these commonsense aspects. We present the first work on generating commonsense captions directly from videos, to describe latent aspects such as intentions, effects, and attributes. We present a new dataset “Video-to-Commonsense (V2C)” that contains ~9k videos of human agents performing various actions, annotated with 3 types of commonsense descriptions. Additionally we explore the use of open-ended video-based commonsense question answering (V2C-QA) as a way to enrich our captions. Both the generation task and the QA task can be used to enrich video captions.

pdf bib
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale | Pratyay Banerjee | Chitta Baral | Yezhou Yang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.


pdf bib
Identification of Adverse Drug Reaction Mentions in Tweets – SMM4H Shared Task 2019
Samarth Rawal | Siddharth Rawal | Saadat Anwar | Chitta Baral
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

Analyzing social media posts can offer insights into a wide range of topics that are commonly discussed online, providing valuable information for studying various health-related phenomena reported online. The outcome of this work can offer insights into pharmacovigilance research to monitor the adverse effects of medications. This research specifically looks into mentions of adverse drug reactions (ADRs) in Twitter data through the Social Media Mining for Health Applications (SMM4H) Shared Task 2019. Adverse drug reactions are undesired harmful effects which can arise from medication or other methods of treatment. The goal of this research is to build accurate models using natural language processing techniques to detect reports of adverse drug reactions in Twitter data and extract these words or phrases.

pdf bib
Combining Knowledge Hunting and Neural Language Models to Solve the Winograd Schema Challenge
Ashok Prakash | Arpit Sharma | Arindam Mitra | Chitta Baral
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Winograd Schema Challenge (WSC) is a pronoun resolution task which seems to require reasoning with commonsense knowledge. The needed knowledge is not present in the given text. Automatic extraction of the needed knowledge is a bottleneck in solving the challenge. The existing state-of-the-art approach uses the knowledge embedded in their pre-trained language model. However, the language models only embed part of the knowledge, the ones related to frequently co-existing concepts. This limits the performance of such models on the WSC problems. In this work, we build-up on the language model based methods and augment them with a commonsense knowledge hunting (using automatic extraction from text) module and an explicit reasoning module. Our end-to-end system built in such a manner improves on the accuracy of two of the available language model based approaches by 5.53% and 7.7% respectively. Overall our system achieves the state-of-the-art accuracy of 71.06% on the WSC dataset, an improvement of 7.36% over the previous best.

pdf bib
Careful Selection of Knowledge to Solve Open Book Question Answering
Pratyay Banerjee | Kuntal Kumar Pal | Arindam Mitra | Chitta Baral
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Open book question answering is a type of natural language based QA (NLQA) where questions are expected to be answered with respect to a given set of open book facts, and common knowledge about a topic. Recently a challenge involving such QA, OpenBookQA, has been proposed. Unlike most other NLQA that focus on linguistic understanding, OpenBookQA requires deeper reasoning involving linguistic understanding as well as reasoning with common knowledge. In this paper we address QA with respect to the OpenBookQA dataset and combine state of the art language models with abductive information retrieval (IR), information gain based re-ranking, passage selection and weighted scoring to achieve 72.0% accuracy, an 11.6% improvement over the current state of the art.


pdf bib
Learning To Use Formulas To Solve Simple Arithmetic Problems
Arindam Mitra | Chitta Baral
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


pdf bib
Learning to Automatically Solve Logic Grid Puzzles
Arindam Mitra | Chitta Baral
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Identifying Various Kinds of Event Mentions in K-Parser Output
Arpit Sharma | Nguyen Vo | Somak Aditya | Chitta Baral
Proceedings of the The 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib
The NL2KR Platform for building Natural Language Translation Systems
Nguyen Vo | Arindam Mitra | Chitta Baral
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Recognizing Social Constructs from Textual Conversation
Somak Aditya | Chitta Baral | Nguyen Ha Vo | Joohyung Lee | Jieping Ye | Zaw Naung | Barry Lumpkin | Jenny Hastings | Richard Scherl | Dawn M. Sweet | Daniela Inclezan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


pdf bib
Using Inverse lambda and Generalization to Translate English to Formal Languages
Chitta Baral | Juraj Dzifcak | Marcos Alvarez Gonzalez | Jiayu Zhou
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)


pdf bib
Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text
Siddhartha Jonnalagadda | Luis Tari | Jörg Hakenberg | Chitta Baral | Graciela Gonzalez
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers


pdf bib
IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text
Syed Toufeeq Ahmed | Deepthi Chidambaram | Hasan Davulcu | Chitta Baral
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics


pdf bib
Using answer set programming to answer complex queries
Chitta Baral | Michael Gelfond | Richard Scherl
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004