Proceedings of the Second Workshop on Social Influence in Conversations (SICon 2024)

James Hale, Kushal Chawla, Muskan Garg (Editors)

Anthology ID:: 2024.sicon-1
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Venue:: SICon
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://aclanthology.org/2024.sicon-1/
DOI:: 10.18653/v1/2024.sicon-1
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2024.sicon-1.pdf

pdf bib
Proceedings of the Second Workshop on Social Influence in Conversations (SICon 2024)
James Hale | Kushal Chawla | Muskan Garg

pdf bib abs
Observing the Southern US Culture of Honor Using Large-Scale Social Media Analysis
Juho Kim | Michael Guerzhoy

A culture of honor refers to a social system where individuals’ status, reputation, and esteem play a central role in governing interpersonal relations. Past works have associated this concept with the United States (US) South and related with it various traits such as higher sensitivity to insult, a higher value on reputation, and a tendency to react violently to insults. In this paper, we hypothesize and confirm that internet users from the US South, where a culture of honor is more prevalent, are more likely to display a trait predicted by their belonging to a culture of honor. Specifically, we test the hypothesis that US Southerners are more likely to retaliate to personal attacks by personally attacking back. We leverage OpenAI’s GPT-3.5 API to both geolocate internet users and to automatically detect whether users are insulting each other. We validate the use of GPT-3.5 by measuring its performance on manually-labeled subsets of the data. Our work demonstrates the potential of formulating a hypothesis based on a conceptual framework, operationalizing it in a way that is amenable to large-scale LLM-aided analysis, manually validating the use of the LLM, and drawing a conclusion.

pdf bib abs
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
Ziqi Yin | Hao Wang | Kaito Horio | Daisuke Kawahara | Satoshi Sekine

We investigate the impact of politeness levels in prompts on the performance of large language models (LLMs). Polite language in human communications often garners more compliance and effectiveness, while rudeness can cause aversion, impacting response quality. We consider that LLMs mirror human communication traits, suggesting they align with human cultural norms. We assess the impact of politeness in prompts on LLMs across English, Chinese, and Japanese tasks. We observed that impolite prompts often result in poor performance, but overly polite language does not guarantee better outcomes. The best politeness level is different according to the language. This phenomenon suggests that LLMs not only reflect human behavior but are also influenced by language, particularly in different cultural contexts. Our findings highlight the need to factor in politeness for cross-cultural natural language processing and LLM usage.

pdf bib abs
Personality Differences Drive Conversational Dynamics: A High-Dimensional NLP Approach
Julia R. Fischer | Nilam Ram

This paper investigates how the topical flow of dyadic conversations emerges over time and how differences in interlocutors’ personality traits contribute to this topical flow. Leveraging text embeddings, we map the trajectories of conversations between strangers into a high-dimensional space. Using nonlinear projections and clustering, we then identify when each interlocutor enters and exits various topics. Differences in conversational flow are quantified via , a summary measure of the “spread” of topics covered during a conversation, and , a time-varying measure of the cosine similarity between interlocutors’ embeddings. Our findings suggest that interlocutors with a larger difference in the personality dimension of openness influence each other to spend more time discussing a wider range of topics and that interlocutors with a larger difference in extraversion experience a larger decrease in linguistic alignment throughout their conversation. We also examine how participants’ affect (emotion) changes from before to after a conversation, finding that a larger difference in extraversion predicts a larger difference in affect change and that a greater topic entropy predicts a larger affect increase. This work demonstrates how communication research can be advanced through the use of high-dimensional NLP methods and identifies personality difference as an important driver of social influence.

pdf bib abs
RecomMind: Movie Recommendation Dialogue with Seeker’s Internal State
Takashi Kodama | Hirokazu Kiyomaru | Yin Jou Huang | Sadao Kurohashi

Humans pay careful attention to the interlocutor’s internal state in dialogues. For example, in recommendation dialogues, we make recommendations while estimating the seeker’s internal state, such as his/her level of knowledge and interest. Since there are no existing annotated resources for the analysis and experiment, we constructed RecomMind, a movie recommendation dialogue dataset with annotations of the seeker’s internal state at the entity level. Each entity has a first-person label annotated by the seeker and a second-person label annotated by the recommender. Our analysis based on RecomMind reveals that the success of recommendations is enhanced when recommenders mention entities that seekers do not know but are interested in. We also propose a response generation framework that explicitly considers the seeker’s internal state, utilizing the chain-of-thought prompting. The human evaluation results show that our proposed method outperforms the baseline method in both consistency and the success of recommendations.

pdf bib abs
Leveraging Large Language Models for Code-Mixed Data Augmentation in Sentiment Analysis
Linda Zeng

Code-mixing (CM), where speakers blend languages within a single expression, is prevalent in multilingual societies but poses challenges for natural language processing due to its complexity and limited data. We propose using a large language model to generate synthetic CM data, which is then used to enhance the performance of task-specific models for CM sentiment analysis. Our results show that in Spanish-English, synthetic data improved the F1 score by 9.32%, outperforming previous augmentation techniques. However, in Malayalam-English, synthetic data only helped when the baseline was low; with strong natural data, additional synthetic data offered little benefit. Human evaluation confirmed that this approach is a simple, cost-effective way to generate natural-sounding CM sentences, particularly beneficial for low baselines. Our findings suggest that few-shot prompting of large language models is a promising method for CM data augmentation and has significant impact on improving sentiment analysis, an important element in the development of social influence systems.

The unchecked spread of digital information, combined with increasing political polarization and the tendency of individuals to isolate themselves from opposing political viewpoints opposing views, has driven researchers to develop systems for automatically detecting political bias in media. This trend has been further fueled by discussions on social media. We explore methods for categorizing bias in US news articles, comparing rule-based and deep learning approaches. The study highlights the sensitivity of modern self-learning systems to unconstrained data ingestion, while reconsidering the strengths of traditional rule-based systems. Applying both models to left-leaning (CNN) and right-leaning (FOX) News articles, we assess their effectiveness on data beyond the original training and test sets. This analysis highlights each model’s accuracy, offers a framework for exploring deep-learning explainability, and sheds light on political bias in US news media. We contrast the opaque architecture of a deep learning model with the transparency of a linguistically informed rule-based model, showing that the rule-based model performs consistently across different data conditions and offers greater transparency, whereas the deep learning model is dependent on the training set and struggles with unseen data.

pdf bib abs
”So, are you a different person today?” Analyzing Bias in Questions during Parole Hearings
Wassiliki Siskou | Ingrid Espinoza

During Parole Suitability Hearings commissioners need to evaluate whether an inmate’s risk of reoffending has decreased sufficiently to justify their release from prison before completing their full sentence. The conversation between the commissioners and the inmate is the key element of such hearings and is largely driven by question-and-answer patterns which can be influenced by the commissioner’s questioning behavior. To our knowledge, no previous study has investigated the relationship between the types of questions asked during parole hearings and potentially biased outcomes. We address this gap by analysing commissioner’s questioning behavior during Californian parole hearings. We test ChatGPT-4o’s capability of annotating questions automatically and achieve a high F1-score of 0.91 without prior training. By analysing all questions posed directly by commissioners to inmates, we tested for potential biases in question types across multiple demographic variables. The results show minimal bias in questioning behavior toward inmates asking for parole.

Successful social influence, whether at individual or community levels, requires expertise and care in several dimensions of communication: understanding of emotions, beliefs, and values; transparency; and context-aware behavior shaping. Based on our experience in identifying mediation needs in social media and engaging with moderators and users, we developed a set of principles that we believe social influence systems should adhere to to ensure ethical operation, effectiveness, widespread adoption, and trust by users on both sides of the engagement of influence. We demonstrate these principles in D-ESC: Dialogue Assistant for Engaging in Social-Cybermediation, in the context of AI-assisted social media mediation, a newer paradigm of automatic moderation that responds to unique and changing communities while engendering and maintaining trust in users, moderators, and platform-holders. Through this case study, we identify opportunities for our principles to guide future systems towards greater opportunities for positive social change.

pdf bib abs
EHDChat: A Knowledge-Grounded, Empathy-Enhanced Language Model for Healthcare Interactions
Shenghan Wu | Wynne Hsu | Mong Li Lee

Large Language Models (LLMs) excel at a range of tasks but often struggle with issues like hallucination and inadequate empathy support. To address hallucinations, we ground our dialogues in medical knowledge sourced from external repositories such as Disease Ontology and DrugBank. To improve empathy support, we develop the Empathetic Healthcare Dialogues dataset, which utilizes multiple dialogue strategies in each response. This dataset is then used to fine-tune an LLM, and we introduce a lightweight, adaptable method called Strategy Combination Guidance to enhance the emotional support capabilities of the fine-tuned model, named EHDChat. Our evaluations show that EHDChat significantly outperforms existing models in providing emotional support and medical accuracy, demonstrating the effectiveness of our approach in enhancing empathetic and informed AI interactions in healthcare.

Aspect Sentiment Triplet Extraction (ASTE) is a challenging task in sentiment analysis, aiming to provide fine-grained insights into human sentiments. However, existing benchmarks are limited to two domains and do not evaluate model performance on unseen domains, raising concerns about the generalization of proposed methods. Furthermore, it remains unclear if large language models (LLMs) can effectively handle complex sentiment tasks like ASTE. In this work, we address the issue of generalization in ASTE from both a benchmarking and modeling perspective. We introduce a domain-expanded benchmark by annotating samples from diverse domains, enabling evaluation of models in both in-domain and out-of-domain settings. Additionally, we propose CASE, a simple and effective decoding strategy that enhances trustworthiness and performance of LLMs in ASTE. Through comprehensive experiments involving multiple tasks, settings, and models, we demonstrate that CASE can serve as a general decoding strategy for complex sentiment tasks. By expanding the scope of evaluation and providing a more reliable decoding strategy, we aim to inspire the research community to reevaluate the generalizability of benchmarks and models for ASTE. Our code, data, and models are available at https://github.com/DAMO-NLP-SG/domain-expanded-aste.