Subrata Mitra

2025

Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chat- bots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an un- supervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can vary extensively in supervision format, from numer- ical to binary as well as multi-dimensional with many different values. We present a framework for fine-tuning LLMs using heterogeneous feed- back, which has two main components. First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF. Next, given this unified feedback dataset, we extract a high- quality and diverse subset to obtain perfor- mance increases potentially exceeding the full dataset. We conduct extensive experiments to understand the effectiveness of these tech- niques for incorporating heterogeneous feed- back, and demonstrate improvements from us- ing a high-quality and diverse subset of the data. We find that our framework is able to improve models in multiple areas simultaneously, such as in instruction following and bias reduction.

2024

pdf bib abs

In this paper, we study personalized federated learning for text classification with Pretrained Language Models (PLMs). We identify two challenges in efficiently leveraging PLMs for personalized federated learning: 1) Communication. PLMs are usually large in size, e.g., with hundreds of millions of parameters, inducing huge communication cost in a federated setting. 2) Local Training. Training with PLMs generally requires back-propagation, during which memory consumption can be several times that of the forward-propagation. This may not be affordable when the PLMs are trained locally on the clients that are resource constrained, e.g., mobile devices with limited access to memory resources. Additionally, the proprietary PLMs can be provided as concealed APIs, for which the back-propagation operations may not be available. In solving these, we propose a training framework that includes an approach of discrete local search for gradient-free local training, along with a compression mechanism inspired from the linear word analogy that allows communicating with discretely indexed tokens, thus significantly reducing the communication cost. Experiments show that our gradient-free framework achieves superior performance compared with baselines.

2023

pdf bib abs

Federated learning involves collaborative training with private data from multiple platforms, while not violating data privacy. We study the problem of federated domain adaptation for Named Entity Recognition (NER), where we seek to transfer knowledge across different platforms with data of multiple domains. In addition, we consider a practical and challenging scenario, where NER datasets of different platforms of federated learning are annotated with heterogeneous tag sets, i.e., different sets of entity types. The goal is to train a global model with federated learning, such that it can predict with a complete tag set, i.e., with all the occurring entity types for data across all platforms. To cope with the heterogeneous tag sets in a multi-domain setting, we propose a distillation approach along with a mechanism of instance weighting to facilitate knowledge transfer across platforms. Besides, we release two re-annotated clinic NER datasets, for testing the proposed method in the clinic domain. Our method shows superior empirical performance for NER with federated learning.

2022

pdf bib abs

Previous work of class-incremental learning for Named Entity Recognition (NER) relies on the assumption that there exists abundance of labeled data for the training of new classes. In this work, we study a more challenging but practical problem, i.e., few-shot class-incremental learning for NER, where an NER model is trained with only few labeled samples of the new classes, without forgetting knowledge of the old ones. To alleviate the problem of catastrophic forgetting in few-shot class-incremental learning, we reconstruct synthetic training data of the old classes using the trained NER model, augmenting the training of new classes. We further develop a framework that distills from the existing model with both synthetic data, and real data from the current training set. Experimental results show that our approach achieves significant improvements over existing baselines.

1989

pdf bib abs

Parsing Generalized Phrase Structure Grammar with Dynamic Expansion
Navin Budhiraja | Subrata Mitra | Harish Karnick | Rajeev Sangal
Proceedings of the First International Workshop on Parsing Technologies

A parser is described here based on the Cocke-Young-Kassami algorithm which uses immediate dominance and linear precedence rules together with various feature inheritance conventions. The meta rules in the grammar are not applied beforehand but only when needed. This ensures that the rule set is kept to a minimum. At the same time, determining what rule to expand by applying which meta-rule is done in an efficient manner using the meta-rule reference table. Since this table is generated during “compilation” stage, its generation does not add to parsing time.