Bodhisattwa Prasad Majumder - ACL Anthology

Bodhisattwa Prasad Majumder

Also published as: Bodhisattwa Prasad Majumder

2025

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation
Peter Jansen | Oyvind Tafjord | Marissa Radensky | Pao Siangliulue | Tom Hope | Bhavana Dalvi Mishra | Bodhisattwa Prasad Majumder | Daniel S Weld | Peter Clark
Findings of the Association for Computational Linguistics: ACL 2025

Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries.

Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
Zhouhang Xie | Tushar Khot | Bhavana Dalvi Mishra | Harshit Surana | Julian McAuley | Peter Clark | Bodhisattwa Prasad Majumder
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Instruction-following LLMs have recently allowed systems to discover hidden concepts from a collection of unstructured documents based on a natural language description of the purpose of the discovery (i.e., goal). Still, the quality of the discovered concepts remains mixed, as it depends heavily on LLM’s reasoning ability and drops when the data is noisy or beyond LLM’s knowledge. We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM’s instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short. Instruct-LF uses LLMs to propose fine-grained, goal-related properties from documents, estimates their presence across the dataset, and applies gradient-based optimization to uncover hidden factors, where each factor is represented by a cluster of co-occurring properties. We evaluate latent factors produced by Instruct-LF on movie recommendation, text-world navigation, and legal document categorization tasks. These interpretable representations improve downstream task performance by 5-52% than the best baselines and were preferred 1.8 times as often as the best alternative, on average, in human evaluation.

Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025)
Vaibhav Adlakha | Alexandra Chronopoulou | Xiang Lorraine Li | Bodhisattwa Prasad Majumder | Freda Shi | Giorgos Vernikos
Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025)

Social Intelligence in the Age of LLMs
Hao Zhu | Bodhisattwa Prasad Majumder | Dirk Hovy | Diyi Yang
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)

With the emergence of Large Language Models (LLMs), we now have unprecedented opportunities to incorporate human-like communication and context-aware interactions into artificial systems. But what is the current state of LLMs’ capability of social interaction? Can they truly understand social scenarios, perform social reasoning, or interact with humans as socially competent agents? We propose this tutorial as an introduction to and an overview of different aspects of artificial social intelligence and their relationship with LLMs. In this tutorial, we will explore these questions by introducing scientific methods for evaluating social intelligence in LLMs, highlighting the key challenges, and identifying promising research directions. Participants will not only gain a comprehensive overview of the field’s progress, but also acquire technical skills on analysing and developing LLM-based social intelligence.

Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities
Peter Jansen | Bhavana Dalvi Mishra | Harsh Trivedi | Bodhisattwa Prasad Majumder | Tom Hope | Tushar Khot | Doug Downey | Eric Horvitz
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities

2024

To Tell The Truth: Language of Deception and Language Models
Sanchaita Hazra | Bodhisattwa Prasad Majumder
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Text-based false information permeates online discourses, yet evidence of people’s ability to discern truth from such deceptive textual content is scarce. We analyze a novel TV game show data where conversations in a high-stake environment between individuals with conflicting objectives result in lies. We investigate the manifestation of potentially verifiable language cues of deception in the presence of objective truth, a distinguishing feature absent in previous text-based deception datasets. We show that there exists a class of detectors (algorithms) that have similar truth detection performance compared to human subjects, even when the former accesses only the language cues while the latter engages in conversations with complete access to all potential sources of cues (language and audio-visual). Our model, built on a large language model, employs a bottleneck framework to learn discernible cues to determine truth, an act of reasoning in which human subjects often perform poorly, even with incentives. Our model detects novel but accurate language cues in many cases where humans failed to detect deception, opening up the possibility of humans collaborating with algorithms and ameliorating their ability to detect the truth.

Few-shot Dialogue Strategy Learning for Motivational Interviewing via Inductive Reasoning
Zhouhang Xie | Bodhisattwa Prasad Majumder | Mengjie Zhao | Yoshinori Maeda | Keiichi Yamada | Hiromi Wakaki | Julian McAuley
Findings of the Association for Computational Linguistics: ACL 2024

We consider the task of building a dialogue system that can motivate users to adopt positive lifestyle changes, Motivational Interviewing (MI). Addressing such a task requires a system that could infer how to motivate the user effectively. We propose DIIR, a framework that is capable of learning and applying conversation strategies in the form of natural language inductive rules from expert demonstrations. Automatic and human evaluation on instruction-following large language models show natural language strategies descriptions discovered by DIIR can improve active listening skills, reduce unsolicited advice, and promote more collaborative and less authoritative conversations, outperforming in-context demonstrations that are over 50 times longer.

Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization
Yash Kumar Lal | Li Zhang | Faeze Brahman | Bodhisattwa Prasad Majumder | Peter Clark | Niket Tandon
Findings of the Association for Computational Linguistics: ACL 2024

How-to procedures, such as how to plant a garden, are now used by millions of users, but sometimes need customizing to meet a user’s specific needs, e.g., planting a garden without pesticides. Our goal is to measure and improve an LLM’s ability to perform such customization. Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need. We find that a simple architecture with two LLM agents used sequentially performs best, one that edits a generic how-to procedure and one that verifies its executability, significantly outperforming (10.5% absolute) an end-to-end prompted LLM. This suggests that LLMs can be configured reasonably effectively for procedure customization. This also suggests that multi-agent editing architectures may be worth exploring further for other customization applications (e.g. coding, creative writing) in the future.

2023

KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations
Myeongjun Jang | Bodhisattwa Prasad Majumder | Julian McAuley | Thomas Lukasiewicz | Oana-Maria Camburu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and alleviating inconsistencies among generated NLEs. In this work, we leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs. We apply our attack to high-performing NLE models and show that models with higher NLE quality do not necessarily generate fewer inconsistencies. Moreover, we propose an off-the-shelf mitigation method to alleviate inconsistencies by grounding the model into external background knowledge. Our method decreases the inconsistencies of previous high-performing NLE models as detected by our attack.

2022

Proceedings of the 7th Workshop on Representation Learning for NLP
Spandana Gella | He He | Bodhisattwa Prasad Majumder | Burcu Can | Eleonora Giunchiglia | Samuel Cahyawijaya | Sewon Min | Maximilian Mozes | Xiang Lorraine Li | Isabelle Augenstein | Anna Rogers | Kyunghyun Cho | Edward Grefenstette | Laura Rimell | Chris Dyer
Proceedings of the 7th Workshop on Representation Learning for NLP

Achieving Conversational Goals with Unsupervised Post-hoc Knowledge Injection
Bodhisattwa Prasad Majumder | Harsh Jhamtani | Taylor Berg-Kirkpatrick | Julian McAuley
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A limitation of current neural dialog models is that they tend to suffer from a lack of specificity and informativeness in generated responses, primarily due to dependence on training data that covers a limited variety of scenarios and conveys limited knowledge. One way to alleviate this issue is to extract relevant knowledge from external sources at decoding time and incorporate it into the dialog response. In this paper, we propose a post-hoc knowledge-injection technique where we first retrieve a diverse set of relevant knowledge snippets conditioned on both the dialog history and an initial response from an existing dialog model. We construct multiple candidate responses, individually injecting each retrieved snippet into the initial response using a gradient-based decoding method, and then select the final response with an unsupervised ranking step. Our experiments in goal-oriented and knowledge-grounded dialog settings demonstrate that human annotators judge the outputs from the proposed method to be more engaging and informative compared to responses from prior dialog systems. We further show that knowledge-augmentation promotes success in achieving conversational goals in both experimental settings.

Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)
Antoine Bosselut | Xiang Li | Bill Yuchen Lin | Vered Shwartz | Bodhisattwa Prasad Majumder | Yash Kumar Lal | Rachel Rudinger | Xiang Ren | Niket Tandon | Vilém Zouhar
Proceedings of the First Workshop on Commonsense Representation and Reasoning (CSRR 2022)

Controlling Bias Exposure for Fair Interpretable Predictions
Zexue He | Yu Wang | Julian McAuley | Bodhisattwa Prasad Majumder
Findings of the Association for Computational Linguistics: EMNLP 2022

Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information ‘fairly’, rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjustingthe predictive model’s belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.

2021

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann | Tosin Adewumi | Karmanya Aggarwal | Pawan Sasanka Ammanamanchi | Anuoluwapo Aremu | Antoine Bosselut | Khyathi Raghavi Chandu | Miruna-Adriana Clinciu | Dipanjan Das | Kaustubh Dhole | Wanyu Du | Esin Durmus | Ondřej Dušek | Chris Chinenye Emezue | Varun Gangal | Cristina Garbacea | Tatsunori Hashimoto | Yufang Hou | Yacine Jernite | Harsh Jhamtani | Yangfeng Ji | Shailza Jolly | Mihir Kale | Dhruv Kumar | Faisal Ladhak | Aman Madaan | Mounica Maddela | Khyati Mahajan | Saad Mahamood | Bodhisattwa Prasad Majumder | Pedro Henrique Martins | Angelina McMillan-Major | Simon Mille | Emiel van Miltenburg | Moin Nadeem | Shashi Narayan | Vitaly Nikolaev | Andre Niyongabo Rubungo | Salomey Osei | Ankur Parikh | Laura Perez-Beltrachini | Niranjan Ramesh Rao | Vikas Raunak | Juan Diego Rodriguez | Sashank Santhanam | João Sedoc | Thibault Sellam | Samira Shaikh | Anastasia Shimorina | Marco Antonio Sobrevilla Cabezudo | Hendrik Strobelt | Nishant Subramani | Wei Xu | Diyi Yang | Akhila Yerukola | Jiawei Zhou
Proceedings of the First Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

Unsupervised Enrichment of Persona-grounded Dialog with Background Stories
Bodhisattwa Prasad Majumder | Taylor Berg-Kirkpatrick | Julian McAuley | Harsh Jhamtani
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Humans often refer to personal narratives, life experiences, and events to make a conversation more engaging and rich. While persona-grounded dialog models are able to generate responses that follow a given persona, they often miss out on stating detailed experiences or events related to a persona, often leaving conversations shallow and dull. In this work, we equip dialog models with ‘background stories’ related to a persona by leveraging fictional narratives from existing story datasets (e.g. ROCStories). Since current dialog datasets do not contain such narratives as responses, we perform an unsupervised adaptation of a retrieved story for generating a dialog response using a gradient-based rewriting technique. Our proposed method encourages the generated response to be fluent (i.e., highly likely) with the dialog history, minimally different from the retrieved story to preserve event ordering and consistent with the original persona. We demonstrate that our method can generate responses that are more diverse, and are rated more engaging and human-like by human evaluators, compared to outputs from existing dialog models.

Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding
Zexue He | Bodhisattwa Prasad Majumder | Julian McAuley
Findings of the Association for Computational Linguistics: EMNLP 2021

Written language carries explicit and implicit biases that can distract from meaningful signals. For example, letters of reference may describe male and female candidates differently, or their writing style may indirectly reveal demographic characteristics. At best, such biases distract from the meaningful content of the text; at worst they can lead to unfair outcomes. We investigate the challenge of re-generating input sentences to ‘neutralize’ sensitive attributes while maintaining the semantic meaning of the original text (e.g. is the candidate qualified?). We propose a gradient-based rewriting framework, Detect and Perturb to Neutralize (DEPEN), that first detects sensitive components and masks them for regeneration, then perturbs the generation model at decoding time under a neutralizing constraint that pushes the (predicted) distribution of sensitive attributes towards a uniform distribution. Our experiments in two different scenarios show that DEPEN can regenerate fluent alternatives that are neutral in the sensitive attribute while maintaining the semantics of other attributes.

Ask what’s missing and what’s useful: Improving Clarification Question Generation using Global Knowledge
Bodhisattwa Prasad Majumder | Sudha Rao | Michel Galley | Julian McAuley
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The ability to generate clarification questions i.e., questions that identify useful missing information in a given context, is important in reducing ambiguity. Humans use previous experience with similar contexts to form a global view and compare it to the given context to ascertain what is missing and what is useful in the context. Inspired by this, we propose a model for clarification question generation where we first identify what is missing by taking a difference between the global and the local view and then train a model to identify what is useful and generate a question about it. Our model outperforms several baselines as judged by both automatic metrics and humans.

2020

Representation Learning for Information Extraction from Form-like Documents
Bodhisattwa Prasad Majumder | Navneet Potti | Sandeep Tata | James Bradley Wendt | Qi Zhao | Marc Najork
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose a novel approach using representation learning for tackling the problem of extracting structured information from form-like document images. We propose an extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document. These learned representations are not only useful in solving the extraction task for unseen document templates from two different domains but are also interpretable, as we show using loss cases.

Interview: Large-scale Modeling of Media Dialog with Discourse Patterns and Knowledge Grounding
Bodhisattwa Prasad Majumder | Shuyang Li | Jianmo Ni | Julian McAuley
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this work, we perform the first large-scale analysis of discourse in media dialog and its impact on generative modeling of dialog turns, with a focus on interrogative patterns and use of external knowledge. Discourse analysis can help us understand modes of persuasion, entertainment, and information elicitation in such settings, but has been limited to manual review of small corpora. We introduce **Interview**—a large-scale (105K conversations) media dialog dataset collected from news interview transcripts—which allows us to investigate such patterns at scale. We present a dialog model that leverages external knowledge as well as dialog acts via auxiliary losses and demonstrate that our model quantitatively and qualitatively outperforms strong discourse-agnostic baselines for dialog modeling—generating more specific and topical responses in interview-style conversations.

Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions
Bodhisattwa Prasad Majumder | Harsh Jhamtani | Taylor Berg-Kirkpatrick | Julian McAuley
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing persona-grounded dialog models often fail to capture simple implications of given persona descriptions, something which humans are able to do seamlessly. For example, state-of-the-art models cannot infer that interest in hiking might imply love for nature or longing for a break. In this paper, we propose to expand available persona sentences using existing commonsense knowledge bases and paraphrasing resources to imbue dialog models with access to an expanded and richer set of persona descriptions. Additionally, we introduce fine-grained grounding on personas by encouraging the model to make a discrete choice among persona sentences while synthesizing a dialog response. Since such a choice is not observed in the data, we model it using a discrete latent random variable and use variational learning to sample from hundreds of persona expansions. Our model outperforms competitive baselines on the Persona-Chat dataset in terms of dialog quality and diversity while achieving persona-consistent and controllable dialog generation.

2019

Improving Neural Story Generation by Targeted Common Sense Grounding
Huanru Henry Mao | Bodhisattwa Prasad Majumder | Julian McAuley | Garrison Cottrell
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Stories generated with neural language models have shown promise in grammatical and stylistic consistency. However, the generated stories are still lacking in common sense reasoning, e.g., they often contain sentences deprived of world knowledge. We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding. When combined with our two-stage fine-tuning pipeline, our method achieves improved common sense reasoning and state-of-the-art perplexity on the WritingPrompts (Fan et al., 2018) story generation dataset.

Generating Personalized Recipes from Historical User Preferences
Bodhisattwa Prasad Majumder | Shuyang Li | Jianmo Ni | Julian McAuley
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Existing approaches to recipe generation are unable to create recipes for users with culinary preferences but incomplete knowledge of ingredients in specific dishes. We propose a new task of personalized recipe generation to help these users: expanding a name and incomplete ingredient details into complete natural-text instructions aligned with the user’s historical preferences. We attend on technique- and recipe-level representations of a user’s previously consumed recipes, fusing these ‘user-aware’ representations in an attention fusion layer to control recipe text generation. Experiments on a new dataset of 180K recipes and 700K interactions show our model’s ability to generate plausible and personalized recipes compared to non-personalized baselines.

Co-authors

Antoine Bosselut 2

Yash Kumar Lal 2

Xiang Lorraine Li 2

Tosin Adewumi 1

Vaibhav Adlakha 1

Karmanya Aggarwal 1

Pawan Sasanka Ammanamanchi 1

Anuoluwapo Aremu 1

Isabelle Augenstein 1

Faeze Brahman 1

Samuel Cahyawijaya 1

Oana-Maria Camburu 1

Khyathi Raghavi Chandu 1

Kyunghyun Cho 1

Alexandra Chronopoulou 1

Miruna Clinciu 1

Garrison Cottrell 1

Kaustubh Dhole 1

Ondřej Dušek 1

Chris Chinenye Emezue 1

Michel Galley 1

Cristina Garbacea 1

Sebastian Gehrmann 1

Spandana Gella 1

Eleonora Giunchiglia 1

Edward Grefenstette 1

Tatsunori B. Hashimoto 1

Sanchaita Hazra 1

Myeongjun Erik Jang 1

Yacine Jernite 1

Shailza Jolly 1

Faisal Ladhak 1

Bill Yuchen Lin 1

Thomas Lukasiewicz 1

Mounica Maddela 1

Yoshinori Maeda 1

Khyati Mahajan 1

Saad Mahamood 1

Huanru Henry Mao 1

Pedro Henrique Martins 1

Angelina McMillan-Major 1

Maximilian Mozes 1

Shashi Narayan 1

Vitaly Nikolaev 1

Laura Perez-Beltrachini 1

Navneet Potti 1

Marissa Radensky 1

Niranjan Ramesh Rao 1

Juan Diego Rodriguez 1

Andre Niyongabo Rubungo 1

Rachel Rudinger 1

Sashank Santhanam 1

Thibault Sellam 1

Samira Shaikh 1

Anastasia Shimorina 1

Vered Shwartz 1

Pao Siangliulue 1

Marco Antonio Sobrevilla Cabezudo 1

Hendrik Strobelt 1

Nishant Subramani 1

Harshit Surana 1

Oyvind Tafjord 1

Harsh Trivedi 1

Emiel Van Miltenburg 1

Giorgos Vernikos 1

Hiromi Wakaki 1

Daniel S. Weld 1

James Bradley Wendt 1

Keiichi Yamada 1

Akhila Yerukola 1

Vilém Zouhar 1

Venues