Rupak Sarkar

2025

Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs
Rupak Sarkar | Neha Srikanth | Taylor Pellegrin | Rachel Rudinger | Claire Bonial | Philip Resnik
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While it is commonly accepted that maintaining common ground plays a role in conversational success, little prior research exists connecting conversational grounding to success in task-oriented conversations. We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. We find that disruptions in conversational flow often stem from a misalignment in common ground, driven by a divergence in beliefs and assumptions held by participants. These disruptions, which we call conversational friction, significantly correlate with task success. While LLMs can identify overt cases of conversational friction, they struggle with subtler and more context-dependent instances that require pragmatic or domain-specific reasoning.

pdf bib abs

Many constructs that characterize language, like its complexity or emotionality, have a naturally continuous semantic structure; a public speech is not just “simple” or “complex”, but exists on a continuum between extremes. Although large language models (LLMs) are an attractive tool for measuring scalar constructs, their idiosyncratic treatment of numerical outputs raises questions of how to best apply them. We address these questions with a comprehensive evaluation of LLM-based approaches to scalar construct measurement in social science. Using multiple datasets sourced from the political science literature, we evaluate four approaches: unweighted direct pointwise scoring, aggregation of pairwise comparisons, token-probability-weighted pointwise scoring, and finetuning. Our study finds that pairwise comparisons made by LLMs produce better measurements than simply prompting the LLM to directly output the scores, which suffers from bunching around arbitrary numbers. However, taking the weighted mean over the token probability of scores further improves the measurements over the two previous approaches. Finally, finetuning smaller models with as few as 1,000 training pairs can match or exceed the performance of prompted LLMs.

pdf bib abs

PairScale: Analyzing Attitude Change with Pairwise Comparisons
Rupak Sarkar | Patrick Y. Wu | Kristina Miler | Alexander Hoyle | Philip Resnik
Findings of the Association for Computational Linguistics: NAACL 2025

We introduce a text-based framework for measuring attitudes in communities toward issues of interest, going beyond the pro/con/neutral of conventional stance detection to characterize attitudes on a continuous scale using both implicit and explicit evidence in language. The framework exploits LLMs both to extract attitude-related evidence and to perform pairwise comparisons that yield unidimensional attitude scores via the classic Bradley-Terry model. We validate the LLM-based steps using human judgments, and illustrate the utility of the approach for social science by examining the evolution of attitudes on two high-profile issues in U.S. politics in two political communities on Reddit over the period spanning from the 2016 presidential campaign to the 2022 mid-term elections. WARNING: Potentially sensitive political content.

2024

pdf bib abs

Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering
Neha Srikanth | Rupak Sarkar | Heran Mane | Elizabeth Aparicio | Quynh Nguyen | Rachel Rudinger | Jordan Boyd-Graber
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Questions posed by information-seeking users often contain implicit false or potentially harmful assumptions. In a high-risk domain such as maternal and infant health, a question-answering system must recognize these pragmatic constraints and go beyond simply answering user questions, examining them in context to respond helpfully. To achieve this, we study assumptions and implications, or pragmatic inferences, made when mothers ask questions about pregnancy and infant care by collecting a dataset of 2,727 inferences from 500 questions across three diverse sources. We study how health experts naturally address these inferences when writing answers, and illustrate that informing existing QA pipelines with pragmatic inferences produces responses that are more complete, mitigating the propagation of harmful beliefs.

2023

pdf bib abs

Natural Language Decompositions of Implicit Content Enable Better Text Representations
Alexander Hoyle | Rupak Sarkar | Pranav Goel | Philip Resnik
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibility of the generated content via human judgments. Incorporating these explicit representations of implicit content proves useful in multiple problem settings that involve the human interpretation of utterances: assessing the similarity of arguments, making sense of a body of opinion data, and modeling legislative behavior. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP and particularly its applications to social science.

2022

pdf bib abs

Are Neural Topic Models Broken?
Alexander Hoyle | Pranav Goel | Rupak Sarkar | Philip Resnik
Findings of the Association for Computational Linguistics: EMNLP 2022

Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use.Motivated by content analysis as a dominant real-world use case for topic modeling, we analyze two related aspects of topic models that affect their effectiveness and trustworthiness in practice for that purpose: the stability of their estimates and the extent to which the model’s discovered categories align with human-determined categories in the data. We find that neural topic models fare worse in both respects compared to an established classical method. We take a step toward addressing both issues in tandem by demonstrating that a straightforward ensembling method can reliably outperform the members of the ensemble.

2021

pdf bib abs

Empathy and Hope: Resource Transfer to Model Inter-country Social Media Dynamics
Clay H. Yoo | Shriphani Palakodety | Rupak Sarkar | Ashiqur KhudaBukhsh
Proceedings of the 1st Workshop on NLP for Positive Impact

The ongoing COVID-19 pandemic resulted in significant ramifications for international relations ranging from travel restrictions, global ceasefires, and international vaccine production and sharing agreements. Amidst a wave of infections in India that resulted in a systemic breakdown of healthcare infrastructure, a social welfare organization based in Pakistan offered to procure medical-grade oxygen to assist India - a nation which was involved in four wars with Pakistan in the past few decades. In this paper, we focus on Pakistani Twitter users’ response to the ongoing healthcare crisis in India. While #IndiaNeedsOxygen and #PakistanStandsWithIndia featured among the top-trending hashtags in Pakistan, divisive hashtags such as #EndiaSaySorryToKashmir simultaneously started trending. Against the backdrop of a contentious history including four wars, divisive content of this nature, especially when a country is facing an unprecedented healthcare crisis, fuels further deterioration of relations. In this paper, we define a new task of detecting supportive content and demonstrate that existing NLP for social impact tools can be effectively harnessed for such tasks within a quick turnaround time. We also release the first publicly available data set at the intersection of geopolitical relations and a raging pandemic in the context of India and Pakistan.

2020

pdf bib abs

Social Media Attributions in the Context of Water Crisis
Rupak Sarkar | Sayantan Mahinder | Hirak Sarkar | Ashiqur KhudaBukhsh
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Attribution of natural disasters/collective misfortune is a widely-studied political science problem. However, such studies typically rely on surveys, or expert opinions, or external signals such as voting outcomes. In this paper, we explore the viability of using unstructured, noisy social media data to complement traditional surveys through automatically extracting attribution factors. We present a novel prediction task of attribution tie detection of identifying the factors (e.g., poor city planning, exploding population etc.) held responsible for the crisis in a social media document. We focus on the 2019 Chennai water crisis that rapidly escalated into a discussion topic with global importance following alarming water-crisis statistics. On a challenging data set constructed from YouTube comments (72,098 comments posted by 43,859 users on 623 videos relevant to the crisis), we present a neural baseline to identify attribution ties that achieves a reasonable performance (accuracy: 87.34% on attribution detection and 81.37% on attribution resolution). We release the first annotated data set of 2,500 comments in this important domain.

pdf bib abs

The Non-native Speaker Aspect: Indian English in Social Media
Rupak Sarkar | Sayantan Mahinder | Ashiqur KhudaBukhsh
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

As the largest institutionalized second language variety of English, Indian English has received a sustained focus from linguists for decades. However, to the best of our knowledge, no prior study has contrasted web-expressions of Indian English in noisy social media with English generated by a social media user base that are predominantly native speakers. In this paper, we address this gap in the literature through conducting a comprehensive analysis considering multiple structural and semantic aspects. In addition, we propose a novel application of language models to perform automatic linguistic quality assessment.

Venues

WNUT1

Fix author

Rupak Sarkar

2025

2024

2023

2022

2021

2020

Co-authors

Venues