Parminder Bhatia


2023

pdf bib
ContraCLM: Contrastive Learning For Causal Language Model
Nihal Jain | Dejiao Zhang | Wasi Uddin Ahmad | Zijian Wang | Feng Nan | Xiaopeng Li | Ming Tan | Ramesh Nallapati | Baishakhi Ray | Parminder Bhatia | Xiaofei Ma | Bing Xiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite exciting progress in causal language models, the expressiveness of their representations is largely limited due to poor discrimination ability. To remedy this issue, we present CONTRACLM, a novel contrastive learning framework at both the token-level and the sequence-level. We assess CONTRACLM on a variety of downstream tasks. We show that CONTRACLM enhances the discrimination of representations and bridges the gap with encoder-only models, which makes causal language models better suited for tasks beyond language generation. Specifically, we attain 44% relative improvement on the Semantic Textual Similarity tasks and 34% on Code-to-Code Search tasks. Furthermore, by improving the expressiveness of representations, CONTRACLM also boosts the source code generation capability with 9% relative improvement on execution accuracy on the HumanEval benchmark.

pdf bib
Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation
Robert Giaquinto | Dejiao Zhang | Benjamin Kleiner | Yang Li | Ming Tan | Parminder Bhatia | Ramesh Nallapati | Xiaofei Ma
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Many machine learning-based low-code or no-code applications involve generating code that interacts with structured knowledge. For example, one of the most studied tasks in this area is generating SQL code from a natural language statement. Prior work shows that incorporating context information from the database schema, such as table and column names, is beneficial to model performance on this task. In this work we present a large pretraining dataset and strategy for learning representations of text, tables, and SQL code that leverages the entire context of the problem. Specifically, we build on existing encoder-decoder architecture by introducing a multitask pretraining framework that complements the unique attributes of our diverse pretraining data. Our work represents the first study on large-scale pretraining of encoder-decoder models for interacting with structured knowledge, and offers a new state-of-the-art foundation model in text-to-SQL generation. We validate our approach with experiments on two SQL tasks, showing improvement over existing methods, including a 1.7 and 2.2 percentage point improvement over prior state-of-the-arts on Spider and CoSQL.

pdf bib
ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang | Zheng Li | Haifeng Qian | Chenghao Yang | Zijian Wang | Mingyue Shang | Varun Kumar | Samson Tan | Baishakhi Ray | Parminder Bhatia | Ramesh Nallapati | Murali Krishna Ramanathan | Dan Roth | Bing Xiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide multifaceted assessments of a model’s robustness performance. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt. In addition, we define robustness metrics for code generation models considering the worst-case behavior under each type of perturbation, taking advantage of the fact that executing the generated code can serve as objective evaluation. We demonstrate ReCode on SOTA models using HumanEval, MBPP, as well as function completion tasks derived from them. Interesting observations include: better robustness for CodeGen over InCoder and GPT-J; models are most sensitive to syntax perturbations; more challenging robustness evaluation on MBPP over HumanEval.

pdf bib
Exploring Continual Learning for Code Generation Models
Prateek Yadav | Qing Sun | Hantian Ding | Xiaopeng Li | Dejiao Zhang | Ming Tan | Parminder Bhatia | Xiaofei Ma | Ramesh Nallapati | Murali Krishna Ramanathan | Mohit Bansal | Bing Xiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Large-scale code generation models such as Copilot and CodeT5 have achieved impressive performance. However, libraries are upgraded or deprecated very frequently and re-training large-scale language models is computationally expensive. Therefore, Continual Learning (CL) is an important aspect that remains under-explored in the code domain. In this paper, we introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement, with different input and output programming languages. Next, on our CodeTask-CL benchmark, we compare popular CL techniques from NLP and Vision domains. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism caused by stark distribution shifts in coding tasks. We address this issue with our proposed method, Prompt Pooling with Teacher Forcing (PP-TF), that stabilizes training by enforcing constraints on the prompt selection mechanism and leads to a 21.54% improvement over Prompt Pooling. Along with the benchmark, we establish a training pipeline that can be used for CL on code models, which we believe can motivate further development of CL methods for code models.

pdf bib
A Static Evaluation of Code Completion by Large Language Models
Hantian Ding | Varun Kumar | Yuchen Tian | Zijian Wang | Rob Kwiatkowski | Xiaopeng Li | Murali Krishna Ramanathan | Baishakhi Ray | Parminder Bhatia | Sudipta Sengupta
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform the same evaluation on complex real-world projects considering the execution cost. On the other hand, static analysis tools such as linters, which can detect errors without running the program, haven’t been well explored for evaluating code generation models. In this work, we propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees. Compared with execution-based evaluation, our method is not only more efficient, but also applicable to code in the wild. For experiments, we collect code context from open source repos to generate one million function bodies using public models. Our static analysis reveals that Undefined Name and Unused Variable are the most common errors among others made by language models. Through extensive studies, we also show the impact of sampling temperature, model size, and context on static errors in code completions.

2022

pdf bib
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
Zheng Li | Zijian Wang | Ming Tan | Ramesh Nallapati | Parminder Bhatia | Andrew Arnold | Bing Xiang | Dan Roth
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model, where knowledge is transferred from the full-precision teacher model to the quantized and distilled low-precision student model. Empirical analyses show that, despite the challenging nature of generative tasks, we were able to achieve a 16.5x model footprint compression ratio with little performance drop relative to the full-precision counterparts on multiple summarization and QA datasets. We further pushed the limit of compression ratio to 27.7x and presented the performance-efficiency trade-off for generative tasks using pre-trained models. To the best of our knowledge, this is the first work aiming to effectively distill and quantize sequence-to-sequence pre-trained models for language generation tasks.

pdf bib
Debiasing Neural Retrieval via In-batch Balancing Regularization
Yuantong Li | Xiaokai Wei | Zijian Wang | Shen Wang | Parminder Bhatia | Xiaofei Ma | Andrew Arnold
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics. The in-processing fair ranking methods provides a trade-offs between accuracy and fairness through adding a fairness-related regularization term in the loss function. However, there haven’t been intuitive objective functions that depend on the click probability and user engagement to directly optimize towards this. In this work, we propose the In-Batch Balancing Regularization (IBBR) to mitigate the ranking disparity among subgroups. In particular, we develop a differentiable normed Pairwise Ranking Fairness (nPRF) and leverage the T-statistics on top of nPRF over subgroups as a regularization to improve fairness. Empirical results with the BERT-based neural rankers on the MS MARCO Passage Retrieval dataset with the human-annotated non-gendered queries benchmark (CITATION) show that our IBBR method with nPRF achieves significantly less bias with minimal degradation in ranking performance compared with the baseline.

2021

pdf bib
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations
Chaitanya Shivade | Rashmi Gangadharaiah | Spandana Gella | Sandeep Konam | Shaoqing Yuan | Yi Zhang | Parminder Bhatia | Byron Wallace
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations

pdf bib
Zero-shot Medical Entity Retrieval without Annotation: Learning From Rich Knowledge Graph Semantics
Luyang Kong | Christopher Winestock | Parminder Bhatia
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Neural Entity Recognition with Gazetteer based Fusion
Qing Sun | Parminder Bhatia
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training
Kristjan Arumae | Qing Sun | Parminder Bhatia
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore, we explore gradient and latent clustering based data selection techniques to improve coverage when using elastic weight consolidation and experience replay methods.

pdf bib
Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events
Miguel Ballesteros | Rishita Anubhai | Shuai Wang | Nima Pourdamghani | Yogarshi Vyas | Jie Ma | Parminder Bhatia | Kathleen McKeown | Yaser Al-Onaizan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations (i.e. RoBERTa, BERT or ELMo), transfer and multi-task learning (by leveraging complementary datasets), and self-training techniques. Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task.

pdf bib
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations
Parminder Bhatia | Steven Lin | Rashmi Gangadharaiah | Byron Wallace | Izhak Shafran | Chaitanya Shivade | Nan Du | Mona Diab
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

2019

pdf bib
Joint Entity Extraction and Assertion Detection for Clinical Text
Parminder Bhatia | Busra Celikkaya | Mohammed Khalilia
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Negative medical findings are prevalent in clinical reports, yet discriminating them from positive findings remains a challenging task for in-formation extraction. Most of the existing systems treat this task as a pipeline of two separate tasks, i.e., named entity recognition (NER)and rule-based negation detection. We consider this as a multi-task problem and present a novel end-to-end neural model to jointly extract entities and negations. We extend a standard hierarchical encoder-decoder NER model and first adopt a shared encoder followed by separate decoders for the two tasks. This architecture performs considerably better than the previous rule-based and machine learning-based systems. To overcome the problem of increased parameter size especially for low-resource settings, we propose the Conditional Softmax Shared Decoder architecture which achieves state-of-art results for NER and negation detection on the 2010 i2b2/VA challenge dataset and a proprietary de-identified clinical dataset.

pdf bib
Towards Annotating and Creating Summary Highlights at Sub-sentence Level
Kristjan Arumae | Parminder Bhatia | Fei Liu
Proceedings of the 2nd Workshop on New Frontiers in Summarization

Highlighting is a powerful tool to pick out important content and emphasize. Creating summary highlights at the sub-sentence level is particularly desirable, because sub-sentences are more concise than whole sentences. They are also better suited than individual words and phrases that can potentially lead to disfluent, fragmented summaries. In this paper we seek to generate summary highlights by annotating summary-worthy sub-sentences and teaching classifiers to do the same. We frame the task as jointly selecting important sentences and identifying a single most informative textual unit from each sentence. This formulation dramatically reduces the task complexity involved in sentence compression. Our study provides new benchmarks and baselines for generating highlights at the sub-sentence level.

pdf bib
Relation Extraction using Explicit Context Conditioning
Gaurav Singh | Parminder Bhatia
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Relation extraction (RE) aims to label relations between groups of marked entities in raw text. Most current RE models learn context-aware representations of the target entities that are then used to establish relation between them. This works well for intra-sentence RE, and we call them first-order relations. However, this methodology can sometimes fail to capture complex and long dependencies. To address this, we hypothesize that at times the target entities can be connected via a context token. We refer to such indirect relations as second-order relations, and describe an efficient implementation for computing them. These second-order relation scores are then combined with first-order relation scores to obtain final relation scores. Our empirical results show that the proposed method leads to state-of-the-art performance over two biomedical datasets.

2016

pdf bib
Morphological Priors for Probabilistic Neural Word Embeddings
Parminder Bhatia | Robert Guthrie | Jacob Eisenstein
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Better Document-level Sentiment Analysis from RST Discourse Parsing
Parminder Bhatia | Yangfeng Ji | Jacob Eisenstein
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing