Sarah Ita Levitan - ACL Anthology

Sarah Ita Levitan

2026

The Impact of Highlighting Subjective Language on Perceived News Trustworthiness
Mohammad Shokri | Vivek Sharma | Emily Klapper | Shweta Jain | Elena Filatova | Sarah Ita Levitan
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)

The rise of misinformation and opinionated articles has made understanding how misleading or biased content influences readers an increasingly important problem. While most prior work focuses on detecting misinformation or deceptive language in real time, far less attention has been paid to how such content is perceived by readers, which is an essential component of misinformation’s effectiveness. In this study, we examine whether highlighting subjective sentences in news articles affects perceived trustworthiness. Using a controlled user experiment and 1,334 article–reader evaluations, we find that highlighting subjective content produces a modest yet statistically significant decrease in trust, with substantial variation across articles and participants. To explain this variation, we model trust change after highlighting subjective language as a function of article-level linguistic features and reader-level attitudes. Our findings suggest that readers’ reactions to highlighted subjective language are driven primarily by characteristics of the text itself, and that highlighting subjective language offers benefits for may help readers better assess the reliability of potentially misleading news articles.

Council of LLMs: Evaluating Capability of Large Language Models to Annotate Propaganda
Vivek Sharma | Shweta Jain | Mohammad Shokri | Sarah Ita Levitan | Elena Filatova
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)

Data annotation is essential for supervised natural language processing tasks but remains labor-intensive and expensive. Large language models (LLMs) have emerged as promising alternatives, capable of generating high-quality annotations either autonomously or in collaboration with human annotators. However their use in autonomous annotations is often questioned for their ethical take on subjective matters. This study investigates the effectiveness of LLMs in a autonomous, and hybrid annotation setups in propaganda detection. We evaluate GPT and open-source models on two datasets from different domains, namely, Propaganda Techniques Corpus (PTC) for news articles and the Journalist Media Bias on X (JMBX) for social media. Our results show that LLMs, in general, exhibit high recall but lower precision in detecting propaganda, often over-predicting persuasive content. Multi-annotator setups did not outperform the best models in single-annotator setting although it helped reasoning models boost their performance. Hybrid annotation, combining LLMs and human input, achieved the highest overall accuracy than LLM-only settings. We further analyze misclassifications and found that LLM have higher sensitivity towards certain propaganda techniques like loaded language, name calling, and doubt. Finally, using error typology analysis, we explore the reasoning provided on misclassifications by the LLM. Our result shows that although some studies report LLM outperforming manual annotations and it could prove useful in hybrid annotation, its incorporation in the human annotation pipeline must be implemented with caution.

2025

Finding Common Patterns in Domestic Violence Stories Posted on Reddit
Mohammad Shokri | Emily Klapper | Jason Shan | Sarah Ita Levitan
Proceedings of the The 7th Workshop on Narrative Understanding

Domestic violence survivors often share their experiences in online spaces, offering valuable insights into common abuse patterns. This study analyzes a dataset of personal narratives about domestic violence from Reddit, focusing on event extraction and topic modeling to uncover recurring themes. We evaluate GPT-4 and LLaMA-3.1 for extracting key sentences, finding that GPT-4 exhibits higher precision, while LLaMA-3.1 achieves better recall. Using LLM-based topic assignment, we identify dominant themes such as psychological aggression, financial abuse, and physical assault which align with previously published psychology findings. A co-occurrence and PMI analysis further reveals the interdependencies among different abuse types, emphasizing the multifaceted nature of domestic violence. Our findings provide a structured approach to analyzing survivor narratives, with implications for social support systems and policy interventions.

Personalized Author Obfuscation with Large Language Models
Mohammad Shokri | Sarah Ita Levitan | Rivka Levitan
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

In this paper, we investigate the efficacy of large language models (LLMs) in obfuscating authorship by paraphrasing and altering writing styles. Rather than adopting a holistic approach that evaluates performance across the entire dataset, we focus on user-wise performance to analyze how obfuscation effectiveness varies across individual authors. While LLMs are generally effective, we observe a bimodal distribution of efficacy, with performance varying significantly across users. To address this, we propose a personalized prompting method that outperforms standard prompting techniques and partially mitigates the bimodality issue.

2024

Is It Safe to Tell Your Story? Towards Achieving Privacy for Sensitive Narratives
Mohammad Shokri | Allison Bishop | Sarah Ita Levitan
Proceedings of the 6th Workshop on Narrative Understanding

Evolving tools for narrative analysis present an opportunity to identify common structure in stories that are socially important to tell, such as stories of survival from domestic abuse. A greater structural understanding of such stories could lead to stronger protections against de-anonymization, as well as future tools to help survivors navigate the complex trade-offs inherent in trying to tell their stories safely. In this work we explore narrative patterns within a small set of domestic violence stories, identifying many similarities. We then propose a method to assess the safety of sharing a story based on a distance feature vector.

2023

GC-Hunter at ImageArg Shared Task: Multi-Modal Stance and Persuasiveness Learning
Mohammad Shokri | Sarah Ita Levitan
Proceedings of the 10th Workshop on Argument Mining

With the rising prominence of social media, users frequently supplement their written content with images. This trend has brought about new challenges in automatic processing of social media messages. In order to fully understand the meaning of a post, it is necessary to capture the relationship between the image and the text. In this work we address the two main objectives of the ImageArg shared task. Firstly, we aim to determine the stance of a multi-modal tweet toward a particular issue. We propose a strong baseline, fine-tuning transformer based models on concatenation of tweet text and image text. The second goal is to predict the impact of an image on the persuasiveness of the text in a multi-modal tweet. To capture the persuasiveness of an image, we train vision and language models on the data and explore other sets of features merged with the model, to enhance prediction power. Ultimately, both of these goals contribute toward the broader aim of understanding multi-modal messages on social media and how images and texts relate to each other.

2022

Improving Cross-domain, Cross-lingual and Multi-modal Deception Detection
Subhadarshi Panda | Sarah Ita Levitan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

With the increase of deception and misinformation especially in social media, it has become crucial to be able to develop machine learning methods to automatically identify deceptive language. In this proposal, we identify key challenges underlying deception detection in cross-domain, cross-lingual and multi-modal settings. To improve cross-domain deception classification, we propose to use inter-domain distance to identify a suitable source domain for a given target domain. We propose to study the efficacy of multilingual classification models vs translation for cross-lingual deception classification. Finally, we propose to better understand multi-modal deception detection and explore methods to weight and combine information from multiple modalities to improve multi-modal deception classification.

2021

Detecting Multilingual COVID-19 Misinformation on Social Media via Contextualized Embeddings
Subhadarshi Panda | Sarah Ita Levitan
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

We present machine learning classifiers to automatically identify COVID-19 misinformation on social media in three languages: English, Bulgarian, and Arabic. We compared 4 multitask learning models for this task and found that a model trained with English BERT achieves the best results for English, and multilingual BERT achieves the best results for Bulgarian and Arabic. We experimented with zero shot, few shot, and target-only conditions to evaluate the impact of target-language training data on classifier performance, and to understand the capabilities of different models to generalize across languages in detecting misinformation online. This work was performed as a submission to the shared task, NLP4IF 2021: Fighting the COVID-19 Infodemic. Our best models achieved the second best evaluation test results for Bulgarian and Arabic among all the participating teams and obtained competitive scores for English.

Automatic Detection and Prediction of Psychiatric Hospitalizations From Social Media Posts
Zhengping Jiang | Jonathan Zomick | Sarah Ita Levitan | Mark Serper | Julia Hirschberg
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

We address the problem of predicting psychiatric hospitalizations using linguistic features drawn from social media posts. We formulate this novel task and develop an approach to automatically extract time spans of self-reported psychiatric hospitalizations. Using this dataset, we build predictive models of psychiatric hospitalization, comparing feature sets, user vs. post classification, and comparing model performance using a varying time window of posts. Our best model achieves an F1 of .718 using 7 days of posts. Our results suggest that this is a useful framework for collecting hospitalization data, and that social media data can be leveraged to predict acute psychiatric crises before they occur, potentially saving lives and improving outcomes for individuals with mental illness.

HunterSpeechLab at GermEval 2021: Does Your Comment Claim A Fact? Contextualized Embeddings for German Fact-Claiming Comment Classification
Subhadarshi Panda | Sarah Ita Levitan
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments

In this paper we investigate the efficacy of using contextual embeddings from multilingual BERT and German BERT in identifying fact-claiming comments in German on social media. Additionally, we examine the impact of formulating the classification problem as a multi-task learning problem, where the model identifies toxicity and engagement of the comment in addition to identifying whether it is fact-claiming. We provide a thorough comparison of the two BERT based models compared with a logistic regression baseline and show that German BERT features trained using a multi-task objective achieves the best F1 score on the test set. This work was done as part of a submission to GermEval 2021 shared task on the identification of fact-claiming comments.

2020

Acoustic-Prosodic and Lexical Cues to Deception and Trust: Deciphering How People Detect Lies
Xi (Leslie) Chen | Sarah Ita Levitan | Michelle Levine | Marko Mandic | Julia Hirschberg
Transactions of the Association for Computational Linguistics, Volume 8

Humans rarely perform better than chance at lie detection. To better understand human perception of deception, we created a game framework, LieCatcher, to collect ratings of perceived deception using a large corpus of deceptive and truthful interviews. We analyzed the acoustic-prosodic and linguistic characteristics of language trusted and mistrusted by raters and compared these to characteristics of actual truthful and deceptive language to understand how perception aligns with reality. With this data we built classifiers to automatically distinguish trusted from mistrusted speech, achieving an F1 of 66.1%. We next evaluated whether the strategies raters said they used to discriminate between truthful and deceptive responses were in fact useful. Our results show that, although several prosodic and lexical features were consistently perceived as trustworthy, they were not reliable cues. Also, the strategies that judges reported using in deception detection were not helpful for the task. Our work sheds light on the nature of trusted language and provides insight into the challenging problem of human deception detection.

Detection of Mental Health from Reddit via Deep Contextualized Representations
Zhengping Jiang | Sarah Ita Levitan | Jonathan Zomick | Julia Hirschberg
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

We address the problem of automatic detection of psychiatric disorders from the linguistic content of social media posts. We build a large scale dataset of Reddit posts from users with eight disorders and a control user group. We extract and analyze linguistic characteristics of posts and identify differences between diagnostic groups. We build strong classification models based on deep contextualized word representations and show that they outperform previously applied statistical models with simple linguistic features by large margins. We compare user-level and post-level classification performance, as well as an ensembled multiclass model.

A Novel Methodology for Developing Automatic Harassment Classifiers for Twitter
Ishaan Arora | Julia Guo | Sarah Ita Levitan | Susan McGregor | Julia Hirschberg
Proceedings of the Fourth Workshop on Online Abuse and Harms

Most efforts at identifying abusive speech online rely on public corpora that have been scraped from websites using keyword-based queries or released by site or platform owners for research purposes. These are typically labeled by crowd-sourced annotators – not the targets of the abuse themselves. While this method of data collection supports fast development of machine learning classifiers, the models built on them often fail in the context of real-world harassment and abuse, which contain nuances less easily identified by non-targets. Here, we present a mixed-methods approach to create classifiers for abuse and harassment which leverages direct engagement with the target group in order to achieve high quality and ecological validity of data sets and labels, and to generate deeper insights into the key tactics of bad actors. We use women journalists’ experience on Twitter as an initial community of focus. We identify several structural mechanisms of abuse that we believe will generalize to other target communities.

2019

Linguistic Analysis of Schizophrenia in Reddit Posts
Jonathan Zomick | Sarah Ita Levitan | Mark Serper
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

We explore linguistic indicators of schizophrenia in Reddit discussion forums. Schizophrenia (SZ) is a chronic mental disorder that affects a person’s thoughts and behaviors. Identifying and detecting signs of SZ is difficult given that SZ is relatively uncommon, affecting approximately 1% of the US population, and people suffering with SZ often believe that they do not have the disorder. Linguistic abnormalities are a hallmark of SZ and many of the illness’s symptoms are manifested through language. In this paper we leverage the vast amount of data available from social media and use statistical and machine learning approaches to study linguistic characteristics of SZ. We collected and analyzed a large corpus of Reddit posts from users claiming to have received a formal diagnosis of SZ and identified several linguistic features that differentiated these users from a control (CTL) group. We compared these results to other findings on social media linguistic analysis and SZ. We also developed a machine learning classifier to automatically identify self-identified users with SZ on Reddit.

2018

Linguistic Cues to Deception and Perceived Deception in Interview Dialogues
Sarah Ita Levitan | Angel Maredia | Julia Hirschberg
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We explore deception detection in interview dialogues. We analyze a set of linguistic features in both truthful and deceptive responses to interview questions. We also study the perception of deception, identifying characteristics of statements that are perceived as truthful or deceptive by interviewers. Our analysis show significant differences between truthful and deceptive question responses, as well as variations in deception patterns across gender and native language. This analysis motivated our selection of features for machine learning experiments aimed at classifying globally deceptive speech. Our best classification performance is 72.74% F1-Score (about 17% better than human performance), which is achieved using a combination of linguistic features and individual traits.

2017

Comparing Approaches for Automatic Question Identification
Angel Maredia | Kara Schechtman | Sarah Ita Levitan | Julia Hirschberg
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.

2016

Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection
Sarah Ita Levitan | Yocheved Levitan | Guozhen An | Michelle Levine | Rivka Levitan | Andrew Rosenberg | Julia Hirschberg
Proceedings of the Second Workshop on Computational Approaches to Deception Detection

Co-authors

Venues