Shreya Havaldar


pdf bib
Faithful Chain-of-Thought Reasoning
Qing Lyu | Shreya Havaldar | Adam Stein | Li Zhang | Delip Rao | Eric Wong | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Comparing Styles across Languages
Shreya Havaldar | Matthew Pressimone | Eric Wong | Lyle Ungar
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Understanding how styles differ across languages is advantageous for training both humans and computers to generate culturally appropriate text. We introduce an explanation framework to extract stylistic differences from multilingual LMs and compare styles across languages. Our framework (1) generates comprehensive style lexica in any language and (2) consolidates feature importances from LMs into comparable lexical categories. We apply this framework to compare politeness, creating the first holistic multilingual politeness dataset and exploring how politeness varies across four languages. Our approach enables an effective evaluation of how distinct linguistic categories contribute to stylistic variations and provides interpretable insights into how people communicate differently around the world.

pdf bib
Multilingual Language Models are not Multicultural: A Case Study in Emotion
Shreya Havaldar | Bhumika Singhal | Sunny Rai | Langchen Liu | Sharath Chandra Guntuku | Lyle Ungar
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Emotions are experienced and expressed differently across the world. In order to use Large Language Models (LMs) for multilingual tasks that require emotional sensitivity, LMs must reflect this cultural variation in emotion. In this study, we investigate whether the widely-used multilingual LMs in 2023 reflect differences in emotional expressions across cultures and languages. We find that embeddings obtained from LMs (e.g., XLM-RoBERTa) are Anglocentric, and generative LMs (e.g., ChatGPT) reflect Western norms, even when responding to prompts in other languages. Our results show that multilingual LMs do not successfully learn the culturally appropriate nuances of emotion and we highlight possible research directions towards correcting this.