Ritam Dutt - ACL Anthology

Ritam Dutt

2025

SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks
Ritam Dutt | Carolyn Rose | Maarten Sap
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Effective human communication in social settings is contingent on recognizing subtle cues, such as intents or implications. Without such cues, NLP models risk missing social signals, instead relying on surface patterns. We introduce SOCIAL SCAFFOLDS, an automated framework for facilitating generalization across social reasoning tasks by generating rationales that make these social cues explicit. Grounded in narrative modeling principles, we generate task-agnostic rationales that capture different perspectives, i.e., that of the speaker, the listener, and the general world-view. Our experimental suite showcases that providing rationales as augmentations aids task performance for both supervised fine-tuning and in-context learning paradigms. Notably, providing all three rationale types significantly improves cross-task performance in 44% of cases, and inferred speaker intent in 31.3% of cases. We conduct statistical and ablation analyses that show how rationales complement the input text and are used effectively by models.

R²-CoD: Understanding Text-Graph Complementarity in Relational Reasoning via Knowledge Co-Distillation
Zhen Wu | Ritam Dutt | Luke M. Breitfeller | Armineh Nourbakhsh | Siddharth Parekh | Carolyn Rose
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Relational reasoning lies at the core of many NLP tasks, drawing on complementary signals from text and graphs. While prior research has investigated how to leverage this dual complementarity, a detailed and systematic understanding of text-graph interplay and its effect on hybrid models remains underexplored. We take an analysis-driven approach to investigate text–graph representation complementarity via a unified architecture that supports knowledge co-distillation (CoD). We explore five tasks involving relational reasoning that differ in how text and graph structures encode the information needed to solve that task. By tracking how these dual representations evolve during training, we uncover interpretable patterns of alignment and divergence, and provide insights into when and why their integration is beneficial.

Can dependency parses facilitate generalization in language models? A case study of cross-lingual relation extraction
Ritam Dutt | Shounak Sural | Carolyn Rose
Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing

In this work, we propose DEPGEN, a framework for evaluating the generalization capabilities of language models on the task of relation extraction, with dependency parses as scaffolds. We use a GNN-based framework that takes dependency parses as input and learns embeddings of entities which are augmented to a baseline multilingual encoder. We also investigate the role of dependency parses when they are included as part of the prompt to LLMs in a zero-shot learning setup. We observe that including off-the-shelf dependency parses can aid relation extraction, with the best performing model having a mild relative improvement of 0.91% and 1.5% in the in-domain and zero-shot setting respectively across two datasets. For the in-context learning setup, we observe an average improvement of 1.67%, with significant gains for low-performing LLMs. We also carry out extensive statistical analysis to investigate how different factors such as the choice of the dependency parser or the nature of the prompt impact performance. We make our code and results publicly available for the research community at https://github.com/ShoRit/multilingual-re.git.

Can dependency parses facilitate generalization in language models? A case study of cross-lingual relation extraction
Ritam Dutt | Shounak Sural | Carolyn Rose
Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing

In this work, we propose DEPGEN, a framework for evaluating the generalization capabilities of language models on the task of relation extraction, with dependency parses as scaffolds. We use a GNN-based framework that takes dependency parses as input and learns embeddings of entities which are augmented to a baseline multilingual encoder. We also investigate the role of dependency parses when they are included as part of the prompt to LLMs in a zero-shot learning setup. We observe that including off-the-shelf dependency parses can aid relation extraction, with the best performing model having a mild relative improvement of 0.91% and 1.5% in the in-domain and zero-shot setting respectively across two datasets. For the in-context learning setup, we observe an average improvement of 1.67%, with significant gains for low-performing LLMs. We also carry out extensive statistical analysis to investigate how different factors such as the choice of the dependency parser or the nature of the prompt impact performance. We make our code and results publicly available for the research community at https://github.com/ShoRit/multilingual-re.git.

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.

Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)
James Hale | Brian Deuksin Kwon | Ritam Dutt
Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)

2024

Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations
Ritam Dutt | Zhen Wu | Jiaxin Shi | Divyanshu Sheth | Prakhar Gupta | Carolyn Rose
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a generalizable classification approach that leverages Large Language Models (LLMs) to facilitate the detection of implicitly encoded social meaning in conversations. We design a multi-faceted prompt to extract a textual explanation of the reasoning that connects visible cues to underlying social meanings. These extracted explanations or rationales serve as augmentations to the conversational text to facilitate dialogue understanding and transfer. Our empirical results over 2,340 experimental settings demonstrate the significant positive impact of adding these rationales. Our findings hold true for in-domain classification, zero-shot, and few-shot domain transfer for two different social meaning detection tasks, each spanning two different corpora.

Investigating the Generalizability of Pretrained Language Models across Multiple Dimensions: A Case Study of NLI and MRC
Ritam Dutt | Sagnik Ray Choudhury | Varun Venkat Rao | Carolyn Rose | V.G.Vinod Vydiswaran
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP

Generalization refers to the ability of machine learning models to perform well on dataset distributions different from the one it was trained on. While several pre-existing works have characterized the generalizability of NLP models across different dimensions, such as domain shift, adversarial perturbations, or compositional variations, most studies were carried out in a stand-alone setting, emphasizing a single dimension of interest. We bridge this gap by systematically investigating the generalizability of pre-trained language models across different architectures, sizes, and training strategies, over multiple dimensions for the task of natural language inference and question answering. Our results indicate that model instances typically exhibit consistent generalization trends, i.e., they generalize equally well (or poorly) across most scenarios, and this ability is correlated with model architecture, base dataset performance, size, and training mechanism. We hope this research motivates further work in a) developing a multi-dimensional generalization benchmark for systematic evaluation and b) examining the reasons behind models’ generalization abilities. The code and data are available at https://github.com/sagnik/md-gen-nlp, and the trained models are released at https://huggingface.co/varun-v-rao.

Evaluating Large Language Models on Social Signal Sensitivity: An Appraisal Theory Approach
Zhen Wu | Ritam Dutt | Carolyn Rose
Proceedings of the 1st Human-Centered Large Language Modeling Workshop

We present a framework to assess the sensitivity of Large Language Models (LLMs) to textually embedded social signals using an Appraisal Theory perspective. We report on an experiment that uses prompts encoding three dimensions of social signals: Affect, Judgment, and Appreciation. In response to the prompt, an LLM generates both an analysis (Insight) and a conversational Response, which are analyzed in terms of sensitivity to the signals. We quantitatively evaluate the output text through topical analysis of the Insight and predicted social intelligence scores of the Response in terms of empathy and emotional polarity. Key findings show that LLMs are more sensitive to positive signals. The personas impact Responses but not the Insight. We discuss how our framework can be extended to a broader set of social signals, personas, and scenarios to evaluate LLM behaviors under various conditions.

2023

Linguistic representations for fewer-shot relation extraction across domains
Sireesh Gururaja | Ritam Dutt | Tinglong Liao | Carolyn Rosé
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolds on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability by providing features that function as cross-domain pivots. We focus on the task of relation extraction on three datasets of procedural text in two domains, cooking and materials science. Our approach augments a popular transformer-based architecture by alternately incorporating syntactic and semantic graphs constructed by freely available off-the-shelf tools. We examine their utility for enhancing generalization, and investigate whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains. We find that while the inclusion of these graphs results in significantly higher performance in few-shot transfer, both types of graph exhibit roughly equivalent utility.

Counting the Bugs in ChatGPT’s Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model
Leonie Weissweiler | Valentin Hofmann | Anjali Kantharuban | Anna Cai | Ritam Dutt | Amey Hengle | Anubha Kabra | Atharva Kulkarni | Abhishek Vijayakumar | Haofei Yu | Hinrich Schuetze | Kemal Oflazer | David Mortensen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko’s (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results—through the lens of morphology—cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.

GrailQA++: A Challenging Zero-Shot Benchmark for Knowledge Base Question Answering
Ritam Dutt | Sopan Khosla | Vinayshekhar Bannihatti Kumar | Rashmi Gangadharaiah
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Exploring the Reasons for Non-generalizability of KBQA systems
Sopan Khosla | Ritam Dutt | Vinayshekhar Bannihatti Kumar | Rashmi Gangadharaiah
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP

Recent research has demonstrated impressive generalization capabilities of several Knowledge Base Question Answering (KBQA) models on the GrailQA dataset. We inspect whether these models can generalize to other datasets in a zero-shot setting. We notice a significant drop in performance and investigate the causes for the same. We observe that the models are dependent not only on the structural complexity of the questions, but also on the linguistic styles of framing a question. Specifically, the linguistic dimensions corresponding to explicitness, readability, coherence, and grammaticality have a significant impact on the performance of state-of-the-art KBQA models. Overall our results showcase the brittleness of such models and the need for creating generalizable systems.

2022

R3 : Refined Retriever-Reader pipeline for Multidoc2dial
Srijan Bansal | Suraj Tripathi | Sumit Agarwal | Sireesh Gururaja | Aditya Srikanth Veerubhotla | Ritam Dutt | Teruko Mitamura | Eric Nyberg
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

In this paper, we present our submission to the DialDoc shared task based on the MultiDoc2Dial dataset. MultiDoc2Dial is a conversational question answering dataset that grounds dialogues in multiple documents. The task involves grounding a user’s query in a document followed by generating an appropriate response. We propose several improvements over the baseline’s retriever-reader architecture to aid in modeling goal-oriented dialogues grounded in multiple documents. Our proposed approach employs sparse representations for passage retrieval, a passage re-ranker, the fusion-in-decoder architecture for generation, and a curriculum learning training paradigm. Our approach shows a 12 point improvement in BLEU score compared to the baseline RAG model.

PerKGQA: Question Answering over Personalized Knowledge Graphs
Ritam Dutt | Kasturi Bhattacharjee | Rashmi Gangadharaiah | Dan Roth | Carolyn Rose
Findings of the Association for Computational Linguistics: NAACL 2022

Previous studies on question answering over knowledge graphs have typically operated over a single knowledge graph (KG). This KG is assumed to be known a priori and is lever- aged similarly for all users’ queries during inference. However, such an assumption is not applicable to real-world settings, such as health- care, where one needs to handle queries of new users over unseen KGs during inference. Furthermore, privacy concerns and high computational costs render it infeasible to query the single KG that has information about all users while answering a specific user’s query. The above concerns motivate our question answer- ing setting over personalized knowledge graphs (PERKGQA) where each user has restricted access to their KG. We observe that current state-of-the-art KGQA methods that require learning prior node representations fare poorly. We propose two complementary approaches, PATHCBR and PATHRGCN for PERKGQA. The former is a simple non-parametric technique that employs case-based reasoning, while the latter is a parametric approach using graph neural networks. Our proposed methods circumvent learning prior representations, can generalize to unseen KGs, and outperform strong baselines on an academic and an internal dataset by 6.5% and 10.5%.

2021

Team JARS: DialDoc Subtask 1 - Improved Knowledge Identification with Supervised Out-of-Domain Pretraining
Sopan Khosla | Justin Lovelace | Ritam Dutt | Adithya Pratapa
Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021)

In this paper, we discuss our submission for DialDoc subtask 1. The subtask requires systems to extract knowledge from FAQ-type documents vital to reply to a user’s query in a conversational setting. We experiment with pretraining a BERT-based question-answering model on different QA datasets from MRQA, as well as conversational QA datasets like CoQA and QuAC. Our results show that models pretrained on CoQA and QuAC perform better than their counterparts that are pretrained on MRQA datasets. Our results also indicate that adding more pretraining data does not necessarily result in improved performance. Our final model, which is an ensemble of AlBERT-XL pretrained on CoQA and QuAC independently, with the chosen answer having the highest average probability score, achieves an F1-Score of 70.9% on the official test-set.

ResPer: Computationally Modelling Resisting Strategies in Persuasive Conversations
Ritam Dutt | Sayan Sinha | Rishabh Joshi | Surya Shekhar Chakraborty | Meredith Riggs | Xinru Yan | Haogang Bao | Carolyn Rose
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Modelling persuasion strategies as predictors of task outcome has several real-world applications and has received considerable attention from the computational linguistics community. However, previous research has failed to account for the resisting strategies employed by an individual to foil such persuasion attempts. Grounded in prior literature in cognitive and social psychology, we propose a generalised framework for identifying resisting strategies in persuasive conversations. We instantiate our framework on two distinct datasets comprising persuasion and negotiation conversations. We also leverage a hierarchical sequence-labelling neural architecture to infer the aforementioned resisting strategies automatically. Our experiments reveal the asymmetry of power roles in non-collaborative goal-directed conversations and the benefits accrued from incorporating resisting strategies on the final conversation outcome. We also investigate the role of different resisting strategies on the conversation outcome and glean insights that corroborate with past findings. We also make the code and the dataset of this work publicly available at https://github.com/americast/resper.

2020

Keeping Up Appearances: Computational Modeling of Face Acts in Persuasion Oriented Discussions
Ritam Dutt | Rishabh Joshi | Carolyn Rose
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The notion of face refers to the public self-image of an individual that emerges both from the individual’s own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in the politeness theory of Brown and Levinson (1978), we propose a generalized framework for modeling face acts in persuasion conversations, resulting in a reliable coding manual, an annotated corpus, and computational models. The framework reveals insights about differences in face act utilization between asymmetric roles in persuasion conversations. Using computational models, we are able to successfully identify face acts as well as predict a key conversational outcome (e.g. donation success). Finally, we model a latent representation of the conversational state to analyze the impact of predicted face acts on the probability of a positive conversational outcome and observe several correlations that corroborate previous findings.

LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification
Sopan Khosla | Rishabh Joshi | Ritam Dutt | Alan W Black | Yulia Tsvetkov
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper we describe our submission for the task of Propaganda Span Identification in news articles. We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda. The ”multi-granular” model incorporates linguistic knowledge at various levels of text granularity, including word, sentence and document level syntactic, semantic and pragmatic affect features, which significantly improve model performance, compared to its language-agnostic variant. To facilitate better representation learning, we also collect a corpus of 10k news articles, and use it for fine-tuning the model. The final model is a majority-voting ensemble which learns different propaganda class boundaries by leveraging different subsets of incorporated knowledge.

NARMADA: Need and Available Resource Managing Assistant for Disasters and Adversities
Kaustubh Hiware | Ritam Dutt | Sayan Sinha | Sohan Patro | Kripa Ghosh | Saptarshi Ghosh
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media

Although a lot of research has been done on utilising Online Social Media during disasters, there exists no system for a specific task that is critical in a post-disaster scenario – identifying resource-needs and resource-availabilities in the disaster-affected region, coupled with their subsequent matching. To this end, we present NARMADA, a semi-automated platform which leverages the crowd-sourced information from social media posts for assisting post-disaster relief coordination efforts. The system employs Natural Language Processing and Information Retrieval techniques for identifying resource-needs and resource-availabilities from microblogs, extracting resources from the posts, and also matching the needs to suitable availabilities. The system is thus capable of facilitating the judicious management of resources during post-disaster relief operations.

2018

CL Scholar: The ACL Anthology Knowledge Graph Miner
Mayank Singh | Pradeep Dogga | Sohan Patro | Dhiraj Barnwal | Ritam Dutt | Rajarshi Haldar | Pawan Goyal | Animesh Mukherjee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We present CL Scholar, the ACL Anthology knowledge graph miner to facilitate high-quality search and exploration of current research progress in the computational linguistics community. In contrast to previous works, periodically crawling, indexing and processing of new incoming articles is completely automated in the current system. CL Scholar utilizes both textual and network information for knowledge graph construction. As an additional novel initiative, CL Scholar supports more than 1200 scholarly natural language queries along with standard keyword-based search on constructed knowledge graph. It answers binary, statistical and list based natural language queries. The current system is deployed at http://cnerg.iitkgp.ac.in/aclakg. We also provide REST API support along with bulk download facility. Our code and data are available at https://github.com/CLScholar.

Co-authors

Sireesh Gururaja 2

Vinayshekhar Bannihatti Kumar 2

Shounak Sural 2

Sumit Agarwal 1

Hamdan Al-Ali 1

Giuseppe Attanasio 1

Ioana Baldini 1

Srijan Bansal 1

Dhiraj Barnwal 1

Kasturi Bhattacharjee 1

Alan W. Black 1

Luke M. Breitfeller 1

Surya Shekhar Chakraborty 1

Sagnik Ray Choudhury 1

Miruna Clinciu 1

Pieter Delobelle 1

Kaustubh Dhole 1

Amirbek Djanibekov 1

Pradeep Dogga 1

Jessica Zosa Forde 1

Saptarshi Ghosh 1

Prakhar Gupta 1

Rajarshi Haldar 1

Kaustubh Hiware 1

Valentin Hofmann 1

Carolin Holtermann 1

Lucie-Aimée Kaffee 1

Anjali Kantharuban 1

Atharva Kulkarni 1

Brian Deuksin Kwon 1

Anne Lauscher 1

Tinglong Liao 1

Roberto L Lopez-Davila 1

Justin Lovelace 1

Jonibek Mansurov 1

Maraim Masoud 1

Teruko Mitamura 1

Margaret Mitchell 1

David R. Mortensen 1

Sagnik Mukherjee 1

Animesh Mukherjee 1

Nurdaulet Mukhituly 1

Nikita Nangia 1

Aurelie Neveol 1

Armineh Nourbakhsh 1

Kemal Oflazer 1

Anaelia Ovalle 1

Siddharth Parekh 1

Giada Pistilli 1

Esther Ploeger 1

Adithya Pratapa 1

Dragomir Radev 1

Varun Venkat Rao 1

Meredith Riggs 1

Beatrice Savoldi 1

Hinrich Schütze 1

Shanya Sharma 1

Divyanshu Sheth 1

Karolina Stanczak 1

Arjun Subramonian 1

Eliza Szczechla 1

Tair Djanibekov 1

Tiago Timponi Torrent 1

Suraj Tripathi 1

Yulia Tsvetkov 1

Deepak Tunuguntla 1

Oskar Van Der Wal 1

Aditya Srikanth Veerubhotla 1

Abhishek Vijayakumar 1

Emilio Villa-Cueva 1

Marcelo Viridiano 1

V. G. Vinod Vydiswaran 1

Leonie Weissweiler 1

Venues