Ramakanth Pasunuru


2021

pdf bib
Continual Few-Shot Learning for Text Classification
Ramakanth Pasunuru | Veselin Stoyanov | Mohit Bansal
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Natural Language Processing (NLP) is increasingly relying on general end-to-end systems that need to handle many different linguistic phenomena and nuances. For example, a Natural Language Inference (NLI) system has to recognize sentiment, handle numbers, perform coreference, etc. Our solutions to complex problems are still far from perfect, so it is important to create systems that can learn to correct mistakes quickly, incrementally, and with little training data. In this work, we propose a continual few-shot learning (CFL) task, in which a system is challenged with a difficult phenomenon and asked to learn to correct mistakes with only a few (10 to 15) training examples. To this end, we first create benchmarks based on previously annotated data: two NLI (ANLI and SNLI) and one sentiment analysis (IMDB) datasets. Next, we present various baselines from diverse paradigms (e.g., memory-aware synapses and Prototypical networks) and compare them on few-shot learning and continual few-shot learning setups. Our contributions are in creating a benchmark suite and evaluation protocol for continual few-shot learning on the text classification tasks, and making several interesting observations on the behavior of similarity-based methods. We hope that our work serves as a useful starting point for future work on this important topic.

pdf bib
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration
Eran Hirsch | Alon Eirew | Ori Shapira | Avi Caciularu | Arie Cattan | Ori Ernst | Ramakanth Pasunuru | Hadar Ronen | Mohit Bansal | Ido Dagan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We introduce iFᴀᴄᴇᴛSᴜᴍ, a web application for exploring topical document collections. iFᴀᴄᴇᴛSᴜᴍ integrates interactive summarization together with faceted search, by providing a novel faceted navigation scheme that yields abstractive summaries for the user’s selections. This approach offers both a comprehensive overview as well as particular details regard-ing subtopics of choice. The facets are automatically produced based on cross-document coreference pipelines, rendering generic concepts, entities and statements surfacing in the source texts. We analyze the effectiveness of our application through small-scale user studies that suggest the usefulness of our tool.

pdf bib
Extending Multi-Document Summarization Evaluation to the Interactive Setting
Ori Shapira | Ramakanth Pasunuru | Hadar Ronen | Mohit Bansal | Yael Amsterdamer | Ido Dagan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session. Our framework includes a procedure of collecting real user sessions, as well as evaluation measures relying on summarization standards, but adapted to reflect interaction. All of our solutions and resources are available publicly as a benchmark, allowing comparison of future developments in interactive summarization, and spurring progress in its methodological evaluation. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis motivate the proposed evaluation framework design and support its viability.

pdf bib
Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters
Ramakanth Pasunuru | Mengwen Liu | Mohit Bansal | Sujith Ravi | Markus Dreyer
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

This paper presents an efficient graph-enhanced approach to multi-document summarization (MDS) with an encoder-decoder Transformer model. This model is based on recent advances in pre-training both encoder and decoder on very large text data (Lewis et al., 2019), and it incorporates an efficient encoding mechanism (Beltagy et al., 2020) that avoids the quadratic memory growth typical for traditional Transformers. We show that this powerful combination not only scales to large input documents commonly found when summarizing news clusters; it also enables us to process additional input in the form of auxiliary graph representations, which we derive from the multi-document clusters. We present a mechanism to incorporate such graph information into the encoder-decoder model that was pre-trained on text only. Our approach leads to significant improvements on the Multi-News dataset, overall leading to an average 1.8 ROUGE score improvement over previous work (Li et al., 2020). We also show improvements in a transfer-only setup on the DUC-2004 dataset. The graph encodings lead to summaries that are more abstractive. Human evaluation shows that they are also more informative and factually more consistent with their input documents.

pdf bib
Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline
Ori Ernst | Ori Shapira | Ramakanth Pasunuru | Michael Lepioshkin | Jacob Goldberger | Mohit Bansal | Ido Dagan
Proceedings of the 25th Conference on Computational Natural Language Learning

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.

pdf bib
The Effect of Pretraining on Extractive Summarization for Scientific Documents
Yash Gupta | Pawan Sasanka Ammanamanchi | Shikha Bordia | Arjun Manoharan | Deepak Mittal | Ramakanth Pasunuru | Manish Shrivastava | Maneesh Singh | Mohit Bansal | Preethi Jyothi
Proceedings of the Second Workshop on Scholarly Document Processing

Large pretrained models have seen enormous success in extractive summarization tasks. In this work, we investigate the influence of pretraining on a BERT-based extractive summarization system for scientific documents. We derive significant performance improvements using an intermediate pretraining step that leverages existing summarization datasets and report state-of-the-art results on a recently released scientific summarization dataset, SciTLDR. We systematically analyze the intermediate pretraining step by varying the size and domain of the pretraining corpus, changing the length of the input sequence in the target task and varying target tasks. We also investigate how intermediate pretraining interacts with contextualized word embeddings trained on different domains.

pdf bib
An Overview of Uncertainty Calibration for Text Classification and the Role of Distillation
Han Guo | Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

Recent advances in NLP systems, notably the pretraining-and-finetuning paradigm, have achieved great success in predictive accuracy. However, these systems are usually not well calibrated for uncertainty out-of-the-box. Many recalibration methods have been proposed in the literature for quantifying predictive uncertainty and calibrating model outputs, with varying degrees of complexity. In this work, we present a systematic study of a few of these methods. Focusing on the text classification task and finetuned large pretrained language models, we first show that many of the finetuned models are not well calibrated out-of-the-box, especially when the data come from out-of-domain settings. Next, we compare the effectiveness of a few widely-used recalibration methods (such as ensembles, temperature scaling). Then, we empirically illustrate a connection between distillation and calibration. We view distillation as a regularization term encouraging the student model to output uncertainties that match those of a teacher model. With this insight, we develop simple recalibration methods based on distillation with no additional inference-time cost. We show on the GLUE benchmark that our simple methods can achieve competitive out-of-domain (OOD) calibration performance w.r.t. more expensive approaches. Finally, we include ablations to understand the usefulness of components of our proposed method and examine the transferability of calibration via distillation.

2020

pdf bib
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru | Han Guo | Mohit Bansal
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards.

pdf bib
FENAS: Flexible and Expressive Neural Architecture Search
Ramakanth Pasunuru | Mohit Bansal
Findings of the Association for Computational Linguistics: EMNLP 2020

Architecture search is the automatic process of designing the model or cell structure that is optimal for the given dataset or task. Recently, this approach has shown good improvements in terms of performance (tested on language modeling and image classification) with reasonable training speed using a weight sharing-based approach called Efficient Neural Architecture Search (ENAS). In this work, we propose a novel architecture search algorithm called Flexible and Expressible Neural Architecture Search (FENAS), with more flexible and expressible search space than ENAS, in terms of more activation functions, input edges, and atomic operations. Also, our FENAS approach is able to reproduce the well-known LSTM and GRU architectures (unlike ENAS), and is also able to initialize with them for finding architectures more efficiently. We explore this extended search space via evolutionary search and show that FENAS performs significantly better on several popular text classification tasks and performs similar to ENAS on standard language model benchmark. Further, we present ablations and analyses on our FENAS approach.

2019

pdf bib
Continual and Multi-Task Architecture Search
Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Architecture search is the process of automatically learning the neural model or cell structure that best suits the given task. Recently, this approach has shown promising performance improvements (on language modeling and image classification) with reasonable training speed, using a weight sharing strategy called Efficient Neural Architecture Search (ENAS). In our work, we first introduce a novel continual architecture search (CAS) approach, so as to continually evolve the model parameters during the sequential training of several tasks, without losing performance on previously learned tasks (via block-sparsity and orthogonality constraints), thus enabling life-long learning. Next, we explore a multi-task architecture search (MAS) approach over ENAS for finding a unified, single cell structure that performs well across multiple tasks (via joint controller rewards), and hence allows more generalizable transfer of the cell structure knowledge to an unseen new task. We empirically show the effectiveness of our sequential continual learning and parallel multi-task learning based architecture search approaches on diverse sentence-pair classification tasks (GLUE) and multimodal-generation based video captioning tasks. Further, we present several ablations and analyses on the learned cell structures.

pdf bib
Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation
Ori Shapira | David Gabay | Yang Gao | Hadar Ronen | Ramakanth Pasunuru | Mohit Bansal | Yael Amsterdamer | Ido Dagan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

pdf bib
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning
Han Guo | Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Multi-task learning (MTL) has achieved success over a wide range of problems, where the goal is to improve the performance of a primary task using a set of relevant auxiliary tasks. However, when the usefulness of the auxiliary tasks w.r.t. the primary task is not known a priori, the success of MTL models depends on the correct choice of these auxiliary tasks and also a balanced mixing ratio of these tasks during alternate training. These two problems could be resolved via manual intuition or hyper-parameter tuning over all combinatorial task choices, but this introduces inductive bias or is not scalable when the number of candidate auxiliary tasks is very large. To address these issues, we present AutoSeM, a two-stage MTL pipeline, where the first stage automatically selects the most useful auxiliary tasks via a Beta-Bernoulli multi-armed bandit with Thompson Sampling, and the second stage learns the training mixing ratio of these selected auxiliary tasks via a Gaussian Process based Bayesian optimization framework. We conduct several MTL experiments on the GLUE language understanding tasks, and show that our AutoSeM framework can successfully find relevant auxiliary tasks and automatically learn their mixing ratio, achieving significant performance boosts on several primary tasks. Finally, we present ablations for each stage of AutoSeM and analyze the learned auxiliary task choices.

2018

pdf bib
Multi-Reward Reinforced Summarization with Saliency and Entailment
Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Abstractive text summarization is the task of compressing and rewriting a long document into a short summary while maintaining saliency, directed logical entailment, and non-redundancy. In this work, we address these three important aspects of a good summary via a reinforcement learning approach with two novel reward functions: ROUGESal and Entail, on top of a coverage-based baseline. The ROUGESal reward modifies the ROUGE metric by up-weighting the salient phrases/words detected via a keyphrase classifier. The Entail reward gives high (length-normalized) scores to logically-entailed summaries using an entailment classifier. Further, we show superior performance improvement when these rewards are combined with traditional metric (ROUGE) based rewards, via our novel and effective multi-reward approach of optimizing multiple rewards simultaneously in alternate mini-batches. Our method achieves the new state-of-the-art results on CNN/Daily Mail dataset as well as strong improvements in a test-only transfer setup on DUC-2002.

pdf bib
Dynamic Multi-Level Multi-Task Learning for Sentence Simplification
Han Guo | Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 27th International Conference on Computational Linguistics

Sentence simplification aims to improve readability and understandability, based on several operations such as splitting, deletion, and paraphrasing. However, a valid simplified sentence should also be logically entailed by its input sentence. In this work, we first present a strong pointer-copy mechanism based sequence-to-sequence sentence simplification model, and then improve its entailment and paraphrasing capabilities via multi-task learning with related auxiliary tasks of entailment and paraphrase generation. Moreover, we propose a novel ‘multi-level’ layered soft sharing approach where each auxiliary task shares different (higher versus lower) level layers of the sentence simplification model, depending on the task’s semantic versus lexico-syntactic nature. We also introduce a novel multi-armed bandit based training approach that dynamically learns how to effectively switch across tasks during multi-task learning. Experiments on multiple popular datasets demonstrate that our model outperforms competitive simplification systems in SARI and FKGL automatic metrics, and human evaluation. Further, we present several ablation analyses on alternative layer sharing methods, soft versus hard sharing, dynamic multi-armed bandit sampling approaches, and our model’s learned entailment and paraphrasing skills.

pdf bib
Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation
Han Guo | Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

An accurate abstractive summary of a document should contain all its salient information and should be logically entailed by the input document. We improve these important aspects of abstractive summarization via multi-task learning with the auxiliary tasks of question generation and entailment generation, where the former teaches the summarization model how to look for salient questioning-worthy details, and the latter teaches the model how to rewrite a summary which is a directed-logical subset of the input document. We also propose novel multi-task architectures with high-level (semantic) layer-specific sharing across multiple encoder and decoder layers of the three tasks, as well as soft-sharing mechanisms (and show performance ablations and analysis examples of each contribution). Overall, we achieve statistically significant improvements over the state-of-the-art on both the CNN/DailyMail and Gigaword datasets, as well as on the DUC-2002 transfer setup. We also present several quantitative and qualitative analysis studies of our model’s learned saliency and entailment skills.

pdf bib
Game-Based Video-Context Dialogue
Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Current dialogue systems focus more on textual and speech context knowledge and are usually based on two speakers. Some recent work has investigated static image-based dialogue. However, several real-world human interactions also involve dynamic visual context (similar to videos) as well as dialogue exchanges among multiple speakers. To move closer towards such multimodal conversational skills and visually-situated applications, we introduce a new video-context, many-speaker dialogue dataset based on live-broadcast soccer game videos and chats from Twitch.tv. This challenging testbed allows us to develop visually-grounded dialogue models that should generate relevant temporal and spatial event language from the live video, while also being relevant to the chat history. For strong baselines, we also present several discriminative and generative models, e.g., based on tridirectional attention flow (TriDAF). We evaluate these models via retrieval ranking-recall, automatic phrase-matching metrics, as well as human evaluation studies. We also present dataset analyses, model ablations, and visualizations to understand the contribution of different modalities and model components.

2017

pdf bib
Reinforced Video Captioning with Entailment Rewards
Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic metrics and human evaluation on multiple datasets. Next, we propose a novel entailment-enhanced reward (CIDEnt) that corrects phrase-matching based metrics (such as CIDEr) to only allow for logically-implied partial matches and avoid contradictions, achieving further significant improvements over the CIDEr-reward model. Overall, our CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.

pdf bib
Multi-Task Video Captioning with Video and Entailment Generation
Ramakanth Pasunuru | Mohit Bansal
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generation tasks: a temporally-directed unsupervised video prediction task to learn richer context-aware video encoder representations, and a logically-directed language entailment generation task to learn better video-entailing caption decoder representations. For this, we present a many-to-many multi-task learning model that shares parameters across the encoders and decoders of the three tasks. We achieve significant improvements and the new state-of-the-art on several standard video captioning datasets using diverse automatic and human evaluations. We also show mutual multi-task improvements on the entailment generation task.

pdf bib
Towards Improving Abstractive Summarization via Entailment Generation
Ramakanth Pasunuru | Han Guo | Mohit Bansal
Proceedings of the Workshop on New Frontiers in Summarization

Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-to-sequence models. However, these models can still benefit from stronger natural language inference skills, since a correct summary is logically entailed by the input document, i.e., it should not contain any contradictory or unrelated information. We incorporate such knowledge into an abstractive summarization model via multi-task learning, where we share its decoder parameters with those of an entailment generation model. We achieve promising initial improvements based on multiple metrics and datasets (including a test-only setting). The domain mismatch between the entailment (captions) and summarization (news) datasets suggests that the model is learning some domain-agnostic inference skills.