Tanu Mitra

2025

pdf bib abs
MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform
Hayoung Jung | Shravika Mittal | Ananya Aatreya | Navreet Kaur | Munmun De Choudhury | Tanu Mitra
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Understanding the prevalence of misinformation in health topics online can inform public health policies and interventions. However, measuring such misinformation at scale remains a challenge, particularly for high-stakes but understudied topics like opioid-use disorder (OUD)—a leading cause of death in the U.S. We present the first large-scale study of OUD-related myths on YouTube, a widely-used platform for health information. With clinical experts, we validate 8 pervasive myths and release an expert-labeled video dataset. To scale labeling, we introduce MythTriage, an efficient triage pipeline that uses a lightweight model for routine cases and defers harder ones to a high-performing, but costlier, large language model (LLM). MythTriage achieves up to 0.86 macro F1-score while estimated to reduce annotation time and financial cost by over 76% compared to experts and full LLM labeling. We analyze 2.9K search results and 343K recommendations, uncovering how myths persist on YouTube and offering actionable insights for public health and platform moderation.

pdf bib abs
Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
Hua Shen | Nicholas Clark | Tanu Mitra
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Existing research assesses LLMs’ values by analyzing their stated inclinations, overlooking potential discrepancies between stated values and actions—termed the “Value-Action Gap.” This study introduces ValueActionLens, a framework to evaluate the alignment between LLMs’ stated values and their value-informed actions. The framework includes a dataset of 14.8k value-informed actions across 12 cultures and 11 social topics, along with two tasks measuring alignment through three metrics. Experiments show substantial misalignment between LLM-generated value statements and their actions, with significant variations across scenarios and models. Misalignments reveal potential harms, highlighting risks in relying solely on stated values to predict behavior. The findings stress the need for context-aware evaluations of LLM values and the value-action gaps.

pdf bib abs
Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness
Shayan Alipour | Indira Sen | Mattia Samory | Tanu Mitra
Findings of the Association for Computational Linguistics: ACL 2025

Despite a growing literature finding that large language models (LLMs) exhibit demographic biases, reports with whom they align best are hard to generalize or even contradictory. In this work, we examine the alignment of LLMs with human annotations in five offensive language datasets, comprising approximately 220K annotations. While demographic traits, particularly race, influence alignment, these effects vary across datasets and are often entangled with other factors. Confounders introduced in the annotation process—such as document difficulty, annotator sensitivity, and within-group agreement—account for more variation in alignment patterns than demographic traits. Alignment increases with annotator sensitivity and group agreement, and decreases with document difficulty. Our results underscore the importance of multi-dataset analyses and confounder-aware methodologies in developing robust measures of demographic bias.

As AI advances, aligning it with diverse human and societal values grows critical. But how do we define these values and measure AI’s adherence to them? We present ValueCompass, a framework grounded in psychological theories, to assess human-AI alignment. Applying it to five diverse LLMs and 112 humans from seven countries across four scenarios—collaborative writing, education, public sectors, and healthcare—we uncover key misalignments. For example, humans prioritize national security, while LLMs often reject it. Values also shift across contexts, demanding scenario-specific alignment strategies. This work advances AI design by mapping how systems can better reflect societal ethics.

2024

pdf bib abs
“They are uncultured”: Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu | Hayoung Jung | Anjali Singh | Monojit Choudhury | Tanu Mitra
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate “harm” as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.

This study introduces ValueScope, a framework leveraging language models to quantify social norms and values within online communities, grounded in social science perspectives on normative structures. We employ ValueScope to dissect and analyze linguistic and stylistic expressions across 13 Reddit communities categorized under gender, politics, science, and finance. Our analysis provides a quantitative foundation confirming that even closely related communities exhibit remarkably diverse norms. This diversity supports existing theories and adds a new dimension to understanding community interactions. ValueScope not only delineates differences in social norms but also effectively tracks their evolution and the influence of significant external events like the U.S. presidential elections and the emergence of new sub-communities. The framework thus highlights the pivotal role of social norms in shaping online interactions, presenting a substantial advance in both the theory and application of social norm studies in digital spaces.