Shivanshu Gupta


2022

pdf bib
Successive Prompting for Decomposing Complex Questions
Dheeru Dua | Shivanshu Gupta | Sameer Singh | Matt Gardner
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce “Successive Prompting” where, we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset.

pdf bib
Unobserved Local Structures Make Compositional Generalization Hard
Ben Bogin | Shivanshu Gupta | Jonathan Berant
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

While recent work has shown that sequence-to-sequence models struggle to generalize to new compositions (termed compositional generalization), little is known on what makes compositional generalization hard on a particular test instance. In this work, we investigate the factors that make generalization to certain test instances challenging. We first substantiate that some examples are more difficult than others by showing that different models consistently fail or succeed on the same test instances. Then, we propose a criterion for the difficulty of an example: a test instance is hard if it contains a local structure that was not observed at training time. We formulate a simple decision rule based on this criterion and empirically show it predicts instance-level generalization well across 5 different semantic parsing datasets, substantially better than alternative decision rules. Last, we show local structures can be leveraged for creating difficult adversarial compositional splits and also to improve compositional generalization under limited training budgets by strategically selecting examples for the training set.

pdf bib
Structurally Diverse Sampling for Sample-Efficient Training and Comprehensive Evaluation
Shivanshu Gupta | Sameer Singh | Matt Gardner
Findings of the Association for Computational Linguistics: EMNLP 2022

A growing body of research has demonstrated the inability of NLP models to generalize compositionally and has tried to alleviate it through specialized architectures, training schemes, and data augmentation, among other approaches. In this work, we study a different approach: training on instances with diverse structures. We propose a model-agnostic algorithm for subsampling such sets of instances from a labeled instance pool with structured outputs. Evaluating on both compositional template splits and traditional IID splits of 5 semantic parsing datasets of varying complexity, we show that structurally diverse training using our algorithm leads to comparable or better generalization than prior algorithms in 9 out of 10 dataset-split type pairs. In general, we find structural diversity to consistently improve sample efficiency compared to random train sets. Moreover, we show that structurally diverse sampling yields comprehensive test sets that are a lot more challenging than IID test sets. Finally, we provide two explanations for improved generalization from diverse train sets: 1) improved coverage of output substructures, and 2) a reduction in spurious correlations between these substructures.

2021

pdf bib
COVR: A Test-Bed for Visually Grounded Compositional Generalization with Real Images
Ben Bogin | Shivanshu Gupta | Matt Gardner | Jonathan Berant
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually-grounded compositional generalization with real images. To create COVR, we use real images annotated with scene graphs, and propose an almost fully automatic procedure for generating question-answer pairs along with a set of context images. COVR focuses on questions that require complex reasoning, including higher-order operations such as quantification and aggregation. Due to the automatic generation process, COVR facilitates the creation of compositional splits, where models at test time need to generalize to new concepts and compositions in a zero- or few-shot setting. We construct compositional splits using COVR and demonstrate a myriad of cases where state-of-the-art pre-trained language-and-vision models struggle to compositionally generalize.