Sunny Rai

2025

pdf bib abs
HATS : Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models
Ashray Gupta | Rohan Joseph | Sunny Rai
Proceedings of the 2nd Workshop on Analogical Abstraction in Cognition, Perception, and Language (Analogy-Angle II)

Analogies test a model’s ability to infer implicit relationships between concepts, making them a key benchmark for evaluating reasoning capabilities. While large language models (LLMs) are widely evaluated for reasoning in English, their abilities in Indic languages remain understudied, limiting our understanding of whether these models generalize across languages. To address this gap, we introduce a new Hindi Analogy Test Set (HATS), comprising 405 multiple-choice questions sourced from Indian government exams. We benchmark state-of-the-art multilingual LLMs using various prompting strategies and introduce a grounded Chain of Thought approach that leverages cognitive theories of analogical reasoning. This approach improves model performance on Hindi analogy questions. Our experiments show that models perform best with English prompts, irrespective of the prompting strategy. Our test set addresses the lack of a critical resource to evaluate LLM reasoning capabilities in Hindi. The test set is publicly available for research purposes here https://github.com/Inequilazitive/HATS-Hindi_Analogy_Test_Set

Culture moderates the way individuals perceive and express mental distress. Current understandings of mental health expressions on social media, however, are predominantly derived from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. To address this gap, we examine mental health posts on Reddit made by individuals geolocated in India, to identify variations in social media language specific to the Indian context compared to users from Western nations. Our experiments reveal significant psychosocial variations in emotions and temporal orientation. This study demonstrates the potential of social media platforms for identifying cross-cultural differences in mental health expressions (e.g. seeking advice in India vs seeking support by Western users). Significant linguistic variations in online mental health-related language emphasize the importance of developing precision-targeted interventions that are culturally appropriate.

pdf bib abs
Culturally-Aware Conversations: A Framework & Benchmark for LLMs
Shreya Havaldar | Young Min Cho | Sunny Rai | Lyle Ungar
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)

Existing benchmarks that measure cultural adaptation in LLMs are misaligned with the actual challenges these models face when interacting with users from diverse cultural backgrounds. In this work, we introduce the first framework and benchmark designed to evaluate LLMs in realistic, multicultural conversational settings. Grounded in sociocultural theory, our framework formalizes how linguistic style — a key element of cultural communication — is shaped by situational, relational, and cultural context. We construct a benchmark dataset based on this framework, annotated by culturally diverse raters, and propose a new set of desiderata for cross-cultural evaluation in NLP: conversational framing, stylistic sensitivity, and subjective correctness. We evaluate today’s top LLMs on our benchmark and show that these models struggle with cultural adaptation in a conversational setting.

pdf bib abs
Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice
Sunny Rai | Khushang Zaveri | Shreya Havaldar | Soumna Nema | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Shame and pride are social emotions expressed across cultures to motivate and regulate people’s thoughts, feelings, and behaviors. In this paper, we introduce the first cross-cultural dataset of over 10k shame/pride-related expressions with underlying social expectations from ~5.4K Bollywood and Hollywood movies. We examine *how* and *why* shame and pride are expressed across cultures using a blend of psychology-informed language analysis combined with large language models. We find significant cross-cultural differences in shame and pride expression aligning with known cultural tendencies of the USA and India – e.g., in Hollywood, shame-expressions predominantly discuss *self* whereas shame is expressed toward *others* in Bollywood. Women are more sanctioned across cultures and for violating similar social expectations.

2024

pdf bib abs
Building Knowledge-Guided Lexica to Model Cultural Variation
Shreya Havaldar | Salvatore Giorgi | Sunny Rai | Thomas Talhelm | Sharath Chandra Guntuku | Lyle Ungar
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Cultural variation exists between nations (e.g., the United States vs. China), but also within regions (e.g., California vs. Texas, Los Angeles vs. San Francisco). Measuring this regional cultural variation can illuminate how and why people think and behave differently. Historically, it has been difficult to computationally model cultural variation due to a lack of training data and scalability constraints. In this work, we introduce a new research problem for the NLP community: How do we measure variation in cultural constructs across regions using language? We then provide a scalable solution: building knowledge-guided lexica to model cultural variation, encouraging future work at the intersection of NLP and cultural understanding. We also highlight modern LLMs’ failure to measure cultural variation or generate culturally varied language.

2023

pdf bib abs
An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives
Young Min Cho | Sunny Rai | Lyle Ungar | João Sedoc | Sharath Guntuku
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this gap, we conduct a comprehensive literature review using the PRISMA framework, reviewing 534 papers published in both computer science and medicine. Our systematic review reveals 136 key papers on building mental health-related conversational agents with diverse characteristics of modeling and experimental design techniques. We find that computer science papers focus on LLM techniques and evaluating response quality using automated metrics with little attention to the application while medical papers use rule-based conversational agents and outcome metrics to measure the health outcomes of participants. Based on our findings on transparency, ethics, and cultural heterogeneity in this review, we provide a few recommendations to help bridge the disciplinary divide and enable the cross-disciplinary development of mental health conversational agents.

pdf bib abs
NewsMet : A ‘do it all’ Dataset of Contemporary Metaphors in News Headlines
Rohan Joseph | Timothy Liu | Aik Beng Ng | Simon See | Sunny Rai
Findings of the Association for Computational Linguistics: ACL 2023

Metaphors are highly creative constructs of human language that grow old and eventually die. Popular datasets used for metaphor processing tasks were constructed from dated source texts. In this paper, we propose NewsMet, a large high-quality contemporary dataset of news headlines hand-annotated with metaphorical verbs. The dataset comprises headlines from various sources including political, satirical, reliable and fake. Our dataset serves the purpose of evaluation for the tasks of metaphor interpretation and generation. The experiments reveal several insights and limitations of using LLMs to automate metaphor processing tasks as frequently seen in the recent literature. The dataset is publicly available for research purposes https://github.com/AxleBlaze3/NewsMet_Metaphor_Dataset.

pdf bib abs
Multilingual Language Models are not Multicultural: A Case Study in Emotion
Shreya Havaldar | Sunny Rai | Bhumika Singhal | Langchen Liu | Sharath Chandra Guntuku | Lyle Ungar
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.

2022

pdf bib abs
Identifying Human Needs through Social Media: A study on Indian cities during COVID-19
Sunny Rai | Rohan Joseph | Prakruti Singh Thakur | Mohammed Abdul Khaliq
Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media

In this paper, we present a minimally-supervised approach to identify human needs expressed in tweets. Taking inspiration from Frustration-Aggression theory, we trained RoBERTa model to classify tweets expressing frustration which serves as an indicator of unmet needs. Although the notion of frustration is highly subjective and complex, the findings support the use of pretrained language model in identifying tweets with unmet needs. Our study reveals the major causes behind feeling frustrated during the lockdown and the second wave of the COVID-19 pandemic in India. Our proposed approach can be useful in timely identification and prioritization of emerging human needs in the event of a crisis.

2021

pdf bib abs
#covid is war and #vaccine is weapon? COVID-19 metaphors in India
Mohammed Khaliq | Rohan Joseph | Sunny Rai
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Metaphors are creative cognitive constructs that are employed in everyday conversation to describe abstract concepts and feelings. Prevalent conceptual metaphors such as WAR, MONSTER, and DARKNESS in COVID-19 online discourse sparked a multi-faceted debate over their efficacy in communication, resultant psychological impact on listeners, and their appropriateness in social discourse. In this work, we investigate metaphors used in discussions around COVID-19 on Indian Twitter. We observe subtle transitions in metaphorical mappings as the pandemic progressed. Our experiments, however, didn’t indicate any affective impact of WAR metaphors on the COVID-19 discourse.