H. Andrew Schwartz - ACL Anthology

H. Andrew Schwartz

Also published as: H Andrew Schwartz, Hansen Andrew Schwartz, Hansen A. Schwartz, H. Schwartz

2026

MAQuA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response Theory
Vasudha Varadarajan | Hui Xu | Rebecca Astrid Böhme | Mariam Marlen Mirström | Sverker Sikström | H. Andrew Schwartz
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in LLMs offer new opportunities for scalable, interactive mental health assessment, but excessive querying burdens users and is inefficient for real-world screening across transdiagnostic symptom profiles. We introduce MAQuA, a multi-outcome modeling and adaptive question-asking framework for simultaneous, multidimensional mental health screening. Combining multi-outcome modeling on language responses with item response theory (IRT) and factor analysis, MAQuA selects the questions with most informative responses across multiple dimensions at each turn to optimize diagnostic information, improving accuracy and potentially reducing response burden. Empirical results on a novel dataset reveal that MAQuA reduces the number of assessment questions required for score stabilization by 50–87% compared to random ordering (e.g., achieving stable depression scores with 71% fewer questions and eating disorder scores with 85% fewer questions). MAQuA demonstrates robust performance across both internalizing (depression, anxiety) and externalizing (substance use, eating disorder) domains, with early stopping strategies further reducing patient time and burden. These findings position MAQuA as a powerful and efficient tool for scalable, nuanced, and interactive mental health screening, advancing the integration of LLM-based agents into real-world clinical workflows.

2025

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models
Nikita Soni | August Håkan Nilsson | Syeda Mahwish | Vasudha Varadarajan | H. Andrew Schwartz | Ryan L. Boyd
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)

Mental health is not a fixed trait but a dynamic process shaped by the interplay between individual dispositions and situational contexts. Building on interactionist and constructionist psychological theories, we develop interpretable models to predict well-being and identify adaptive and maladaptive self-states in longitudinal social media data. Our approach integrates person-level psychological traits (e.g., resilience, cognitive distortions, implicit motives) with language-inferred situational features derived from the Situational 8 DIAMONDS framework. We compare these theory-grounded features to embeddings from a psychometrically-informed language model that captures temporal and individual-specific patterns. Results show that our principled, theory-driven features provide competitive performance while offering greater interpretability. Qualitative analyses further highlight the psychological coherence of features most predictive of well-being. These findings underscore the value of integrating computational modeling with psychological theory to assess dynamic mental states in contextually sensitive and human-understandable ways.

WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao | Adithya V Ganesan | Oscar Kjell | Jonah Luby | Akshay Raghavan | Scott Feltman | Whitney Ringwald | Ryan L. Boyd | Benjamin Luft | Camilo Ruggero | Neville Ryant | Roman Kotov | H. Andrew Schwartz
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Current speech encoding pipelines often rely on an additional text-based LM to get robust representations of human communication, even though SotA speech-to-text models often have a LM within. This work proposes an approach to improve the LM within an audio model such that the subsequent text-LM is unnecessary. We introduce **WhiSPA** (**Whi**sper with **S**emantic and **P**sychological **A**lignment), which leverages a novel audio training objective: contrastive loss with a language model embedding as a teacher. Using over 500k speech segments from mental health audio interviews, we evaluate the utility of aligning Whisper’s latent space with semantic representations from a text autoencoder (SBERT) and lexically derived embeddings of basic psychological dimensions: emotion and personality. Over self-supervised affective tasks and downstream psychological tasks, WhiSPA surpasses current speech encoders, achieving an average error reduction of 73.4% and 83.8%, respectively. WhiSPA demonstrates that it is not always necessary to run a subsequent text LM on speech-to-text output in order to get a rich psychological representation of human communication.

Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
Nikita Soni | Pranav Chitale | Khushboo Singh | Niranjan Balasubramanian | H. Andrew Schwartz
Findings of the Association for Computational Linguistics: NAACL 2025

Like most of NLP, models for human-centered NLP tasks—tasks attempting to assess author-level information—predominantly use rep-resentations derived from hidden states of Transformer-based LLMs. However, what component of the LM is used for the representation varies widely. Moreover, there is a need for Human Language Models (HuLMs) that implicitly model the author and provide a user-level hidden state. Here, we systematically evaluate different ways of representing documents and users using different LM and HuLM architectures to predict task outcomes as both dynamically changing states and averaged trait-like user-level attributes of valence, arousal, empathy, and distress. We find that representing documents as an average of the token hidden states performs the best generally. Further, while a user-level hidden state itself is rarely the best representation, we find its inclusion in the model strengthens token or document embeddings used to derive document- and user-level representations resulting in best performances.

The Consistent Lack of Variance of Psychological Factors Expressed by LLMs and Spambots
Vasudha Varadarajan | Salvatore Giorgi | Siddharth Mangalik | Nikita Soni | Dave M. Markowitz | H. Andrew Schwartz
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

In recent years, the proliferation of chatbots like ChatGPT and Claude has led to an increasing volume of AI-generated text. While the text itself is convincingly coherent and human-like, the variety of expressed of human attributes may still be limited. Using theoretical individual differences, the fundamental psychological traits which distinguish people, this study reveals a distinctive characteristic of such content: AI-generations exhibit remarkably limited variation in inferrable psychological traits compared to human-authored texts. We present a review and study across multiple datasets spanning various domains. We find that AI-generated text consistently models the authorship of an “average” human with such little variation that, on aggregate, it is clearly distinguishable from human-written texts using unsupervised methods (i.e., without using ground truth labels). Our results show that (1) fundamental human traits are able to accurately distinguish human- and machine-generated text and (2) current generation capabilities fail to capture a diverse range of human traits

Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
Vishnu Raja | Adithya V Ganesan | Anand Syamkumar | Ritwik Banerjee | H. Schwartz
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

State-of-the-art automatic speech recognition (ASR) models like Whisper perform poorly on atypical speech, such as that produced by individuals with dysarthria. Past works for atypical speech have mostly investigated fully personalized (or idiosyncratic) models, but modeling strategies that can both generalize and handle idiosyncrasy could be more effective for capturing atypical speech. To investigate this, we compare four strategies: (a) *normative* models trained on typical speech (no personalization), (b) *idiosyncratic* models completely personalized to individuals, (c) *dysarthric-normative* models trained on other dysarthric speakers, and (d) *dysarthric-idiosyncratic* models which combine strategies by first modeling normative patterns before adapting to individual speech. In this case study, we find the dysarthric-idiosyncratic model performs better than the idiosyncratic approach while requiring less than half as much personalized data (36.43 WER with 128 train size vs. 36.99 with 256). Further, we found that tuning the speech encoder alone (as opposed to the LM decoder) yielded the best results, reducing word error rate from 71% to 32% on average. Our findings highlight the value of leveraging both normative (cross-speaker) and idiosyncratic (speaker-specific) patterns to improve ASR for underrepresented speech populations. [GitHub: VishnuRaja98/Dysarthric-Speech-Transcription](https://github.com/VishnuRaja98/Dysarthric-Speech-Transcription)

Capturing Author Self Beliefs in Social Media Language
Siddharth Mangalik | Adithya V. Ganesan | Abigail Wheeler | Nicholas Kerry | Jeremy D. W. Clifton | H. Andrew Schwartz | Ryan L. Boyd
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Measuring the prevalence and dimensions of self beliefs is essential for understanding human self-perception and various psychological outcomes. In this paper, we develop a novel task for classifying language that contains explicit or implicit mentions of the author’s self beliefs. We contribute a set of 2,000 human-annotated self beliefs, 100,000 LLM-labeled examples, and 10,000 surveyed self belief paragraphs. We then evaluate several encoder-based classifiers and training routines for this task. Our trained model, SelfAwareNet, achieved an AUC of 0.944, outperforming 0.839 from OpenAI’s state-of-the-art GPT-4o model. Using this model we derive data-driven categories of self beliefs and demonstrate their ability to predict valence, depression, anxiety, and stress. We release the resulting self belief classification model and annotated datasets for use in future research.

Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm
Vasudha Varadarajan | Syeda Mahwish | Xiaoran Liu | Julia Buffolino | Christian C. Luhmann | Ryan L. Boyd | H. Andrew Schwartz
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

While NLP models often seek to capture cognitive states via language, the validity of predicted states is determined by comparing them to annotations created without access the cognitive states of the authors. In behavioral sciences, cognitive states are instead measured via experiments. Here, we introduce an experiment-based framework for evaluating language-based cognitive style models against human behavior. We explore the phenomenon of decision making, and its relationship to the linguistic style of an individual talking about a recent decision they made. The participants then follow a classical decision-making experiment that captures their cognitive style, determined by how preferences change during a decision exercise. We find that language features, intended to capture cognitive style, can predict participants’ decision style with moderate-to-high accuracy (AUC 0.8), demonstrating that cognitive style can be partly captured and revealed by discourse patterns.

Systematic Evaluation of Auto-Encoding and Large Language Model Representations for Capturing Author States and Traits
Khushboo Singh | Vasudha Varadarajan | Adithya V. Ganesan | August Håkan Nilsson | Nikita Soni | Syeda Mahwish | Pranav Chitale | Ryan L. Boyd | Lyle Ungar | Richard N. Rosenthal | H. Andrew Schwartz
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Models (LLMs) are increasingly used in human-centered applications, yet their ability to model diverse psychological constructs is not well understood. In this study, we systematically evaluate a range of Transformer-LMs to predict psychological variables across five major dimensions: affect, substance use, mental health, sociodemographics, and personality. Analyses span three temporal levels—short daily text responses about current affect, text aggregated over two-weeks, and user-level text collected over two years—allowing us to examine how each model’s strengths align with the underlying stability of different constructs. The findings show that mental health signals emerge as the most accurately predicted dimensions (r=0.6) across all temporal scales. At the daily scale, smaller models like DeBERTa and HaRT often performed better, whereas, at longer scales or with greater context, larger model like Llama3-8B performed the best. Also, aggregating text over the entire study period yielded stronger correlations for outcomes, such as age and income. Overall, these results suggest the importance of selecting appropriate model architectures and temporal aggregation techniques based on the stability and nature of the target variable.

Linking Language-based Distortion Detection to Mental Health Outcomes
Vasudha Varadarajan | Allison Lahnala | Sujeeth Vankudari | Akshay Raghavan | Scott Feltman | Syeda Mahwish | Camilo Ruggero | Roman Kotov | H. Andrew Schwartz
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)

Recent work has suggested detection of cognitive distortions as an impactful task for NLP in the clinical space, but the connection between language-detected distortions and validated mental health outcomes has been elusive. In this work, we evaluate the co-occurrence of (a) 10 distortions derived from language-based detectors trained over two common distortion datasets with (b) 12 mental health outcomes contained within two new language-to-mental-health datasets: DS4UD and iHiTOP. We find higher rates of distortions for those with greater mental health condition severity (ranging from r = 0.16 for thought disorders to r = 0.46 for depressed mood), and that the specific distortions of should statements and fortune telling were associated with a depressed mood and being emotionally drained, respectively. This suggested that language-based assessments of cognitive distortion could play a significant role in detection and monitoring of mental health conditions.

Residualized Similarity for Faithfully Explainable Authorship Verification
Peter Zeng | Pegah Alipoormolabashi | Jihu Mun | Gourab Dey | Nikita Soni | Niranjan Balasubramanian | Owen Rambow | H. Schwartz
Findings of the Association for Computational Linguistics: EMNLP 2025

Responsible use of Authorship Verification (AV) systems not only requires high accuracy but also interpretable solutions. More importantly, for systems to be used to make decisions with real-world consequences requires the model’s prediction to be explainable using interpretable features that can be traced to the original texts. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully – if there is an explanation given for a prediction, it doesn’t represent the reasoning process behind the model’s prediction. In this paper, we introduce Residualized Similarity (RS), a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how alike two documents are. The key idea is to use the neural network to predict a similarity residual, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.

2024

Research on psychological risk factors for suicide has developed for decades. However, combining explainable theory with modern data-driven language model approaches is non-trivial. In this study, we propose and evaluate methods for identifying language patterns aligned with theories of suicide risk by combining theory-driven suicidal archetypes with language model-based and relative entropy-based approaches. Archetypes are based on prototypical statements that evince risk of suicidality while relative entropy considers the ratio of how unusual both a risk-familiar and unfamiliar model find the statements. While both approaches independently performed similarly, we find that combining the two significantly improved the performance in the shared task evaluations, yielding our combined system submission with a BERTScore Recall of 0.906. Consistent with the literature, we find that titles are highly informative as suicide risk evidence, despite the brevity. We conclude that a combination of theory- and data-driven methods are needed in the mental health space and can outperform more modern prompt-based methods.

Large Human Language Models: A Need and the Challenges
Nikita Soni | H. Andrew Schwartz | João Sedoc | Niranjan Balasubramanian
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

As research in human-centered NLP advances, there is a growing recognition of the importance of incorporating human and social factors into NLP models. At the same time, our NLP systems have become heavily reliant on LLMs, most of which do not model authors. To build NLP systems that can truly understand human language, we must better integrate human contexts into LLMs. This brings to the fore a range of design considerations and challenges in terms of what human aspects to capture, how to represent them, and what modeling strategies to pursue. To address these, we advocate for three positions toward creating large human language models (LHLMs) using concepts from psychological and behavioral sciences: First, LM training should include the human context. Second, LHLMs should recognize that people are more than their group(s). Third, LHLMs should be able to account for the dynamic and temporally-dependent nature of the human context. We refer to relevant advances and present open challenges that need to be addressed and their possible solutions in realizing these goals.

Proceedings of the 1st Human-Centered Large Language Modeling Workshop
Nikita Soni | Lucie Flek | Ashish Sharma | Diyi Yang | Sara Hooker | H. Andrew Schwartz
Proceedings of the 1st Human-Centered Large Language Modeling Workshop

From Text to Context: Contextualizing Language with Humans, Groups, and Communities for Socially Aware NLP
Adithya V Ganesan | Siddharth Mangalik | Vasudha Varadarajan | Nikita Soni | Swanie Juhng | João Sedoc | H. Andrew Schwartz | Salvatore Giorgi | Ryan L Boyd
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts)

Aimed at the NLP researchers or practitioners who would like to integrate human - individual, group, or societal level factors into their analyses, this tutorial will cover recent techniques and libraries for doing so at each level of analysis. Starting with human-centered techniques that provide benefit to traditional document- or word-level NLP tasks (Garten et al., 2019; Lynn et al., 2017), we undertake a thorough exploration of critical human-level aspects as they pertain to NLP, gradually moving up to higher levels of analysis: individual persons, individual with agent (chat/dialogue), groups of people, and finally communities or societies.

ALBA: Adaptive Language-Based Assessments for Mental Health
Vasudha Varadarajan | Sverker Sikström | Oscar Kjell | H. Andrew Schwartz
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments haveshown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introducesthe task of Adaptive Language-Based Assessment (ALBA), which involves adaptively ordering questions while also scoring an individual’s latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory.We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-basedmethod (ALIRT) and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We foundALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ≈ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.

Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?
Nikita Soni | Niranjan Balasubramanian | H. Andrew Schwartz | Dirk Hovy
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Pre-trained language models consider the context of neighboring words and documents but lack any author context of the human generating the text. However, language depends on the author’s states, traits, social, situational, and environmental attributes, collectively referred to as human context (Soni et al., 2024). Human-centered natural language processing requires incorporating human context into language models. Currently, two methods exist: pre-training with 1) group-wise attributes (e.g., over-45-year-olds) or 2) individual traits. Group attributes are simple but coarse — not all 45-year-olds write the same way — while individual traits allow for more personalized representations, but require more complex modeling and data. It is unclear which approach benefits what tasks. We compare pre-training models with human context via 1) group attributes, 2) individual users, and 3) a combined approach on five user- and document-level tasks. Our results show that there is no best approach, but that human-centered language modeling holds avenues for different methods.

Using Daily Language to Understand Drinking: Multi-Level Longitudinal Differential Language Analysis
Matthew Matero | Huy Vu | August Nilsson | Syeda Mahwish | Young Min Cho | James McKay | Johannes Eichstaedt | Richard Rosenthal | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

Analyses for linking language with psychological factors or behaviors predominately treat linguistic features as a static set, working with a single document per person or aggregating across multiple posts (e.g. on social media) into a single set of features. This limits language to mostly shed light on between-person differences rather than changes in behavior within-person. Here, we collected a novel dataset of daily surveys where participants were asked to describe their experienced well-being and report the number of alcoholic beverages they had within the past 24 hours. Through this data, we first build a multi-level forecasting model that is able to capture within-person change and leverage both the psychological features of the person and daily well-being responses. Then, we propose a longitudinal version of differential language analysis that finds patterns associated with drinking more (e.g. social events) and less (e.g. task-oriented), as well as distinguishing patterns of heavy drinks versus light drinkers.

SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks
Gourab Dey | Adithya V Ganesan | Yash Kumar Lal | Manal Shah | Shreyashee Sinha | Matthew Matero | Salvatore Giorgi | Vivek Kulkarni | H. Andrew Schwartz
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about the effectiveness of instruction tuning on the social domain where implicit pragmatic cues are often needed to be captured. We explore the use of instruction tuning for social science NLP tasks and introduce Socialite-Llama — an open-source, instruction-tuned Llama. On a suite of 20 social science tasks, Socialite-Llama improves upon the performance of Llama as well as matches or improves upon the performance of a state-of-the-art, multi-task finetuned model on a majority of them. Further, Socialite-Llama also leads to improvement on 5 out of 6 related social tasks as compared to Llama, suggesting instruction tuning can lead to generalized social understanding. All resources including our code, model and dataset can be found through [bit.ly/socialitellama](https://bit.ly/socialitellama/).

2023

Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge
Vasudha Varadarajan | Swanie Juhng | Syeda Mahwish | Xiaoran Liu | Jonah Luby | Christian Luhmann | H. Andrew Schwartz
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While transformer-based systems have enabled greater accuracies with fewer training examples, data acquisition obstacles still persist for rare-class tasks – when the class label is very infrequent (e.g. < 5% of samples). Active learning has in general been proposed to alleviate such challenges, but choice of selection strategy, the criteria by which rare-class examples are chosen, has not been systematically evaluated. Further, transformers enable iterative transfer-learning approaches. We propose and investigate transfer- and active learning solutions to the rare class problem of dissonance detection through utilizing models trained on closely related tasks and the evaluation of acquisition strategies, including a proposed probability-of-rare-class (PRC) approach. We perform these experiments for a specific rare-class problem: collecting language samples of cognitive dissonance from social media. We find that PRC is a simple and effective strategy to guide annotations and ultimately improve model accuracy while transfer-learning in a specific order can improve the cold-start performance of the learner but does not benefit iterations of active learning.

Discourse-Level Representations can Improve Prediction of Degree of Anxiety
Swanie Juhng | Matthew Matero | Vasudha Varadarajan | Johannes Eichstaedt | Adithya V Ganesan | H. Andrew Schwartz
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Anxiety disorders are the most common of mental illnesses, but relatively little is known about how to detect them from language. The primary clinical manifestation of anxiety is worry associated cognitive distortions, which are likely expressed at the discourse-level of semantics. Here, we investigate the development of a modern linguistic assessment for degree of anxiety, specifically evaluating the utility of discourse-level information in addition to lexical-level large language model embeddings. We find that a combined lexico-discourse model outperforms models based solely on state-of-the-art contextual embeddings (RoBERTa), with discourse-level representations derived from Sentence-BERT and DiscRE both providing additional predictive power not captured by lexical-level representations. Interpreting the model, we find that discourse patterns of causal explanations, among others, were used significantly more by those scoring high in anxiety, dovetailing with psychological literature.

Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation
Adithya V Ganesan | Yash Kumar Lal | August Nilsson | H. Andrew Schwartz
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Very large language models (LLMs) perform extremely well on a spectrum of NLP tasks in a zero-shot setting. However, little is known about their performance on human-level NLP problems which rely on understanding psychological concepts, such as assessing personality traits. In this work, we investigate the zero-shot ability of GPT-3 to estimate the Big 5 personality traits from users’ social media posts. Through a set of systematic experiments, we find that zero-shot GPT-3 performance is somewhat close to an existing pre-trained SotA for broad classification upon injecting knowledge about the trait in the prompts. However, when prompted to provide fine-grained classification, its performance drops to close to a simple most frequent class (MFC) baseline. We further analyze where GPT-3 performs better, as well as worse, than a pretrained lexical model, illustrating systematic errors that suggest ways to improve LLMs on human-level NLP tasks. The code for this project is available on Github.

2022

WWBP-SQT-lite: Multi-level Models and Difference Embeddings for Moments of Change Identification in Mental Health Forums
Adithya V Ganesan | Vasudha Varadarajan | Juhi Mittal | Shashanka Subrahmanya | Matthew Matero | Nikita Soni | Sharath Chandra Guntuku | Johannes Eichstaedt | H. Andrew Schwartz
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

Psychological states unfold dynamically; to understand and measure mental health at scale we need to detect and measure these changes from sequences of online posts. We evaluate two approaches to capturing psychological changes in text: the first relies on computing the difference between the embedding of a message with the one that precedes it, the second relies on a “human-aware” multi-level recurrent transformer (HaRT). The mood changes of timeline posts of users were annotated into three classes, ‘ordinary,’ ‘switching’ (positive to negative or vice versa) and ‘escalations’ (increasing in intensity). For classifying these mood changes, the difference-between-embeddings technique – applied to RoBERTa embeddings – showed the highest overall F1 score (0.61) across the three different classes on the test set. The technique particularly outperformed the HaRT transformer (and other baselines) in the detection of switches (F1 = .33) and escalations (F1 = .61).Consistent with the literature, the language use patterns associated with mental-health related constructs in prior work (including depression, stress, anger and anxiety) predicted both mood switches and escalations.

Discourse Relation Embeddings: Representing the Relations between Discourse Segments in Social Media
Youngseo Son | Vasudha Varadarajan | H. Andrew Schwartz
Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS)

Discourse relations are typically modeled as a discrete class that characterizes the relation between segments of text (e.g. causal explanations, expansions). However, such predefined discrete classes limit the universe of potential relationships and their nuanced differences. Adding higher-level semantic structure to contextual word embeddings, we propose representing discourse relations as points in high dimensional continuous space. However, unlike words, discourse relations often have no surface form (relations are in between two segments, often with no word or phrase in that gap) which presents a challenge for existing embedding techniques. We present a novel method for automatically creating discourse relation embeddings (DiscRE), addressing the embedding challenge through a weakly supervised, multitask approach to learn diverse and nuanced relations in social media. Results show DiscRE representations obtain the best performance on Twitter discourse relation classification (macro F1=0.76), social media causality prediction (from F1=0.79 to 0.81), and perform beyond modern sentence and word transformers at traditional discourse relation classification, capturing novel nuanced relations (e.g. relations at the intersection of causal explanations and counterfactuals).

Human Language Modeling
Nikita Soni | Matthew Matero | Niranjan Balasubramanian | H. Andrew Schwartz
Findings of the Association for Computational Linguistics: ACL 2022

Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem where by a human- level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing human states. We introduce, HaRT, a large-scale transformer model for solving HuLM, pre-trained on approximately 100,000 social media users, and demonstrate it’s effectiveness in terms of both language modeling (perplexity) for social media and fine-tuning for 4 downstream tasks spanning document- and user-levels. Results on all tasks meet or surpass the current state-of-the-art.

Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment
Matthew Matero | Albert Hung | H. Andrew Schwartz
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

Many recent works in natural language processing have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models have grown to dominate much of NLP but little is known empirically on how to best apply them for mental health assessment. Using degree of depression as a case study, we do an empirical analysis on which off-the-shelf language model, individual layers, and combinations of layers seem most promising when applied to human-level NLP tasks. Notably, we find RoBERTa most effective and, despite the standard in past work suggesting the second-to-last or concatenation of the last 4 layers, we find layer 19 (sixth-to last) is at least as good as layer 23 when using 1 layer. Further, when using multiple layers, distributing them across the second half (i.e. Layers 12+), rather than last 4, of the 24 layers yielded the most accurate results.

Detecting Dissonant Stance in Social Media: The Role of Topic Exposure
Vasudha Varadarajan | Nikita Soni | Weixi Wang | Christian Luhmann | H. Andrew Schwartz | Naoya Inoue
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

We address dissonant stance detection, classifying conflicting stance between two input statements. Computational models for traditional stance detection have typically been trained to indicate pro/con for a given target topic (e.g. gun control) and thus do not generalize well to new topics. In this paper, we systematically evaluate the generalizability of dissonant stance detection to situations where examples of the topic have not been seen at all or have only been seen a few times. We show that dissonant stance detection models trained on only 8 topics, none of which are the target topic, can perform as well as those trained only on a target topic. Further, adding non-target topics boosts performance further up to approximately 32 topics where accuracies start to plateau. Taken together, our experiments suggest dissonant stance detection models can generalize to new unanticipated topics, an important attribute for the social scientific study of social media where new topics emerge daily.

2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
Tianchu Ji | Shraddhan Jain | Michael Ferdman | Peter Milder | H. Andrew Schwartz | Niranjan Balasubramanian
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection
Matthew Matero | Nikita Soni | Niranjan Balasubramanian | H. Andrew Schwartz
Findings of the Association for Computational Linguistics: EMNLP 2021

Much of natural language processing is focused on leveraging large capacity language models, typically trained over single messages with a task of predicting one or more tokens. However, modeling human language at higher-levels of context (i.e., sequences of messages) is under-explored. In stance detection and other social media tasks where the goal is to predict an attribute of a message, we have contextual data that is loosely semantically connected by authorship. Here, we introduce Message-Level Transformer (MeLT) – a hierarchical message-encoder pre-trained over Twitter and applied to the task of stance prediction. We focus on stance prediction as a task benefiting from knowing the context of the message (i.e., the sequence of previous messages). The model is trained using a variant of masked-language modeling; where instead of predicting tokens, it seeks to generate an entire masked (aggregated) message vector via reconstruction loss. We find that applying this pre-trained masked message-level transformer to the downstream task of stance detection achieves F1 performance of 67%.

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality
Adithya V Ganesan | Matthew Matero | Aravind Reddy Ravula | Huy Vu | H. Andrew Schwartz
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just 1/12 of the embedding dimensions.

Characterizing Social Spambots by their Human Traits
Salvatore Giorgi | Lyle Ungar | H. Andrew Schwartz
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

Understanding Weekly COVID-19 Concerns through Dynamic Content-Specific LDA Topic Modeling
Mohammadzaman Zamani | H. Andrew Schwartz | Johannes Eichstaedt | Sharath Chandra Guntuku | Adithya Virinchipuram Ganesan | Sean Clouston | Salvatore Giorgi
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

The novelty and global scale of the COVID-19 pandemic has lead to rapid societal changes in a short span of time. As government policy and health measures shift, public perceptions and concerns also change, an evolution documented within discourse on social media. We propose a dynamic content-specific LDA topic modeling technique that can help to identify different domains of COVID-specific discourse that can be used to track societal shifts in concerns or views. Our experiments show that these model-derived topics are more coherent than standard LDA topics, and also provide new features that are more helpful in prediction of COVID-19 related outcomes including social mobility and unemployment rate.

Detecting Emerging Symptoms of COVID-19 using Context-based Twitter Embeddings
Roshan Santosh | H. Andrew Schwartz | Johannes Eichstaedt | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before to their being reported by the Centers for Disease Control (CDC).

Autoregressive Affective Language Forecasting: A Self-Supervised Task
Matthew Matero | H. Andrew Schwartz
Proceedings of the 28th International Conference on Computational Linguistics

Human natural language is mentioned at a specific point in time while human emotions change over time. While much work has established a strong link between language use and emotional states, few have attempted to model emotional language in time. Here, we introduce the task of affective language forecasting – predicting future change in language based on past changes of language, a task with real-world applications such as treating mental health or forecasting trends in consumer confidence. We establish some of the fundamental autoregressive characteristics of the task (necessary history size, static versus dynamic length, varying time-step resolutions) and then build on popular sequence models for words to instead model sequences of language-based emotion in time. Over a novel Twitter dataset of 1,900 users and weekly + daily scores for 6 emotions and 2 additional linguistic attributes, we find a novel dual-sequence GRU model with decayed hidden states achieves best results (r = .66) significantly out-predicting, e.g., a moving averaging based on the past time-steps (r = .49). We make our anonymized dataset as well as task setup and evaluation code available for others to build on.

Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview
Deven Santosh Shah | H. Andrew Schwartz | Dirk Hovy
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

An increasing number of natural language processing papers address the effect of bias on predictions, introducing mitigation techniques at different parts of the standard NLP pipeline (data and models). However, these works have been conducted individually, without a unifying framework to organize efforts within the field. This situation leads to repetitive approaches, and focuses overly on bias symptoms/effects, rather than on their origins, which could limit the development of effective countermeasures. In this paper, we propose a unifying predictive bias framework for NLP. We summarize the NLP literature and suggest general mathematical definitions of predictive bias. We differentiate two consequences of bias: outcome disparities and error disparities, as well as four potential origins of biases: label bias, selection bias, model overamplification, and semantic bias. Our framework serves as an overview of predictive bias in NLP, integrating existing work into a single structure, and providing a conceptual baseline for improved frameworks.

Learning Emotion from 100 Observations: Unexpected Robustness of Deep Learning under Strong Data Limitations
Sven Buechel | João Sedoc | H. Andrew Schwartz | Lyle Ungar
Proceedings of the Third Workshop on Computational Modeling of People's Opinions, Personality, and Emotion's in Social Media

One of the major downsides of Deep Learning is its supposed need for vast amounts of training data. As such, these techniques appear ill-suited for NLP areas where annotated data is limited, such as less-resourced languages or emotion analysis, with its many nuanced and hard-to-acquire annotation formats. We conduct a questionnaire study indicating that indeed the vast majority of researchers in emotion analysis deems neural models inferior to traditional machine learning when training data is limited. In stark contrast to those survey results, we provide empirical evidence for English, Polish, and Portuguese that commonly used neural architectures can be trained on surprisingly few observations, outperforming n-gram based ridge regression on only 100 data points. Our analysis suggests that high-quality, pre-trained word embeddings are a main factor for achieving those results.

Hierarchical Modeling for User Personality Prediction: The Role of Message-Level Attention
Veronica Lynn | Niranjan Balasubramanian | H. Andrew Schwartz
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Not all documents are equally important. Language processing is increasingly finding use as a supplement for questionnaires to assess psychological attributes of consenting individuals, but most approaches neglect to consider whether all documents of an individual are equally informative. In this paper, we present a novel model that uses message-level attention to learn the relative weight of users’ social media posts for assessing their five factor personality traits. We demonstrate that models with message-level attention outperform those with word-level attention, and ultimately yield state-of-the-art accuracies for all five traits by using both word and message attention in combination with past approaches (an average increase in Pearson r of 2.5%). In addition, examination of the high-signal posts identified by our model provides insight into the relationship between language and personality, helping to inform future work.

2019

Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes
Veronica Lynn | Salvatore Giorgi | Niranjan Balasubramanian | H. Andrew Schwartz
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

NLP naturally puts a primary focus on leveraging document language, occasionally considering user attributes as supplemental. However, as we tackle more social scientific tasks, it is possible user attributes might be of primary importance and the document supplemental. Here, we systematically investigate the predictive power of user-level features alone versus document-level features for document-level tasks. We first show user attributes can sometimes carry more task-related information than the document itself. For example, a tweet-level stance detection model using only 13 user-level attributes (i.e. features that did not depend on the specific tweet) was able to obtain a higher F1 than the top-performing SemEval participant. We then consider multiple tasks and a wider range of user attributes, showing the performance of strong document-only models can often be improved (as in stance, sentiment, and sarcasm) with user attributes, particularly benefiting tasks with stable “trait-like” outcomes (e.g. stance) most relative to frequently changing “state-like” outcomes (e.g. sentiment). These results not only support the growing work on integrating user factors into predictive systems, but that some of our NLP tasks might be better cast primarily as user-level (or human) tasks.

Suicide Risk Assessment with Multi-level Dual-Context Language and BERT
Matthew Matero | Akash Idnani | Youngseo Son | Salvatore Giorgi | Huy Vu | Mohammad Zamani | Parth Limbachiya | Sharath Chandra Guntuku | H. Andrew Schwartz
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

Mental health predictive systems typically model language as if from a single context (e.g. Twitter posts, status updates, or forum posts) and often limited to a single level of analysis (e.g. either the message-level or user-level). Here, we bring these pieces together to explore the use of open-vocabulary (BERT embeddings, topics) and theoretical features (emotional expression lexica, personality) for the task of suicide risk assessment on support forums (the CLPsych-2019 Shared Task). We used dual context based approaches (modeling content from suicide forums separate from other content), built over both traditional ML models as well as a novel dual RNN architecture with user-factor adaptation. We find that while affect from the suicide context distinguishes with no-risk from those with “any-risk”, personality factors from the non-suicide contexts provide distinction of the levels of risk: low, medium, and high risk. Within the shared task, our dual-context approach (listed as SBU-HLAB in the official results) achieved state-of-the-art performance predicting suicide risk using a combination of suicide-context and non-suicide posts (Task B), achieving an F1 score of 0.50 over hidden test set labels.

2018

Identifying Locus of Control in Social Media Language
Masoud Rouhizadeh | Kokil Jaidka | Laura Smith | H. Andrew Schwartz | Anneke Buffone | Lyle Ungar
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Individuals express their locus of control, or “control”, in their language when they identify whether or not they are in control of their circumstances. Although control is a core concept underlying rhetorical style, it is not clear whether control is expressed by how or by what authors write. We explore the roles of syntax and semantics in expressing users’ sense of control –i.e. being “controlled by” or “in control of” their circumstances– in a corpus of annotated Facebook posts. We present rich insights into these linguistic aspects and find that while the language signaling control is easy to identify, it is more challenging to label it is internally or externally controlled, with lexical features outperforming syntactic features at the task. Our findings could have important implications for studying self-expression in social media.

Causal Explanation Analysis on Social Media
Youngseo Son | Nipun Bayas | H. Andrew Schwartz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Understanding causal explanations - reasons given for happenings in one’s life - has been found to be an important psychological factor linked to physical and mental health. Causal explanations are often studied through manual identification of phrases over limited samples of personal writing. Automatic identification of causal explanations in social media, while challenging in relying on contextual and sequential cues, offers a larger-scale alternative to expensive manual ratings and opens the door for new applications (e.g. studying prevailing beliefs about causes, such as climate change). Here, we explore automating causal explanation analysis, building on discourse parsing, and presenting two novel subtasks: causality detection (determining whether a causal explanation exists at all) and causal explanation identification (identifying the specific phrase that is the explanation). We achieve strong accuracies for both tasks but find different approaches best: an SVM for causality prediction (F1 = 0.791) and a hierarchy of Bidirectional LSTMs for causal explanation identification (F1 = 0.853). Finally, we explore applications of our complete pipeline (F1 = 0.868), showing demographic differences in mentions of causal explanation and that the association between a word and sentiment can change when it is used within a causal explanation.

CLPsych 2018 Shared Task: Predicting Current and Future Psychological Health from Childhood Essays
Veronica Lynn | Alissa Goodman | Kate Niederhoffer | Kate Loveys | Philip Resnik | H. Andrew Schwartz
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

We describe the shared task for the CLPsych 2018 workshop, which focused on predicting current and future psychological health from an essay authored in childhood. Language-based predictions of a person’s current health have the potential to supplement traditional psychological assessment such as questionnaires, improving intake risk measurement and monitoring. Predictions of future psychological health can aid with both early detection and the development of preventative care. Research into the mental health trajectory of people, beginning from their childhood, has thus far been an area of little work within the NLP community. This shared task represents one of the first attempts to evaluate the use of early language to predict future health; this has the potential to support a wide variety of clinical health care tasks, from early assessment of lifetime risk for mental health problems, to optimal timing for targeted interventions aimed at both prevention and treatment.

Predicting Human Trustfulness from Facebook Language
Mohammadzaman Zamani | Anneke Buffone | H. Andrew Schwartz
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

Trustfulness — one’s general tendency to have confidence in unknown people or situations — predicts many important real-world outcomes such as mental health and likelihood to cooperate with others such as clinicians. While data-driven measures of interpersonal trust have previously been introduced, here, we develop the first language-based assessment of the personality trait of trustfulness by fitting one’s language to an accepted questionnaire-based trust score. Further, using trustfulness as a type of case study, we explore the role of questionnaire size as well as word count in developing language-based predictive models of users’ psychological traits. We find that leveraging a longer questionnaire can yield greater test set accuracy, while, for training, we find it beneficial to include users who took smaller questionnaires which offers more observations for training. Similarly, after noting a decrease in individual prediction error as word count increased, we found a word count-weighted training scheme was helpful when there were very few users in the first place.

Residualized Factor Adaptation for Community Social Media Prediction Tasks
Mohammadzaman Zamani | H. Andrew Schwartz | Veronica Lynn | Salvatore Giorgi | Niranjan Balasubramanian
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Predictive models over social media language have shown promise in capturing community outcomes, but approaches thus far largely neglect the socio-demographic context (e.g. age, education rates, race) of the community from which the language originates. For example, it may be inaccurate to assume people in Mobile, Alabama, where the population is relatively older, will use words the same way as those from San Francisco, where the median age is younger with a higher rate of college education. In this paper, we present residualized factor adaptation, a novel approach to community prediction tasks which both (a) effectively integrates community attributes, as well as (b) adapts linguistic features to community attributes (factors). We use eleven demographic and socioeconomic attributes, and evaluate our approach over five different community-level predictive tasks, spanning health (heart disease mortality, percent fair/poor health), psychology (life satisfaction), and economics (percent housing price increase, foreclosure rate). Our evaluation shows that residualized factor adaptation significantly improves 4 out of 5 community-level outcome predictions over prior state-of-the-art for incorporating socio-demographic contexts.

The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions
Salvatore Giorgi | Daniel Preoţiuc-Pietro | Anneke Buffone | Daniel Rieman | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. Results on four different U.S. county-level tasks, spanning demographic, health, and psychological outcomes show large and consistent improvements in prediction accuracies (e.g. from Pearson r=.73 to .82 for median income prediction or r=.37 to .47 for life satisfaction prediction) over the standard approach of aggregating all tweets. We make our aggregated and anonymized community-level data, derived from 37 billion tweets – over 1 billion of which were mapped to counties, available for research.

2017

Human Centered NLP with User-Factor Adaptation
Veronica Lynn | Youngseo Son | Vivek Kulkarni | Niranjan Balasubramanian | H. Andrew Schwartz
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We pose the general task of user-factor adaptation – adapting supervised learning models to real-valued user factors inferred from a background of their language, reflecting the idea that a piece of text should be understood within the context of the user that wrote it. We introduce a continuous adaptation technique, suited for real-valued user factors that are common in social science and bringing us closer to personalized NLP, adapting to each user uniquely. We apply this technique with known user factors including age, gender, and personality traits, as well as latent factors, evaluating over five tasks: POS tagging, PP-attachment, sentiment analysis, sarcasm detection, and stance detection. Adaptation provides statistically significant benefits for 3 of the 5 tasks: up to +1.2 points for PP-attachment, +3.4 points for sarcasm, and +3.0 points for stance.

On the Distribution of Lexical Features at Multiple Levels of Analysis
Fatemeh Almodaresi | Lyle Ungar | Vivek Kulkarni | Mohsen Zakeri | Salvatore Giorgi | H. Andrew Schwartz
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Natural language processing has increasingly moved from modeling documents and words toward studying the people behind the language. This move to working with data at the user or community level has presented the field with different characteristics of linguistic data. In this paper, we empirically characterize various lexical distributions at different levels of analysis, showing that, while most features are decidedly sparse and non-normal at the message-level (as with traditional NLP), they follow the central limit theorem to become much more Log-normal or even Normal at the user- and county-levels. Finally, we demonstrate that modeling lexical features for the correct level of analysis leads to marked improvements in common social scientific prediction tasks.

Using Twitter Language to Predict the Real Estate Market
Mohammadzaman Zamani | H. Andrew Schwartz
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We explore whether social media can provide a window into community real estate -foreclosure rates and price changes- beyond that of traditional economic and demographic variables. We find language use in Twitter not only predicts real estate outcomes as well as traditional variables across counties, but that including Twitter language in traditional models leads to a significant improvement (e.g. from Pearson r = :50 to r = :59 for price changes). We overcome the challenge of the relative sparsity and noise in Twitter language variables by showing that training on the residual error of the traditional models leads to more accurate overall assessments. Finally, we discover that it is Twitter language related to business (e.g. ‘company’, ‘marketing’) and technology (e.g. ‘technology’, ‘internet’), among others, that yield predictive power over economics.

Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions
Daniel Rieman | Kokil Jaidka | H. Andrew Schwartz | Lyle Ungar
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Several studies have demonstrated how language models of user attributes, such as personality, can be built by using the Facebook language of social media users in conjunction with their responses to psychology questionnaires. It is challenging to apply these models to make general predictions about attributes of communities, such as personality distributions across US counties, because it requires 1. the potentially inavailability of the original training data because of privacy and ethical regulations, 2. adapting Facebook language models to Twitter language without retraining the model, and 3. adapting from users to county-level collections of tweets. We propose a two-step algorithm, Target Side Domain Adaptation (TSDA) for such domain adaptation when no labeled Twitter/county data is available. TSDA corrects for the different word distributions between Facebook and Twitter and for the varying word distributions across counties by adjusting target side word frequencies; no changes to the trained model are made. In the case of predicting the Big Five county-level personality traits, TSDA outperforms a state-of-the-art domain adaptation method, gives county-level predictions that have fewer extreme outliers, higher year-to-year stability, and higher correlation with county-level outcomes.

DLATK: Differential Language Analysis ToolKit
H. Andrew Schwartz | Salvatore Giorgi | Maarten Sap | Patrick Crutchley | Lyle Ungar | Johannes Eichstaedt
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metrics for continuous outcomes, and (4) robust, proven, and accurate pipelines for social-scientific prediction problems. DLATK integrates multiple popular packages (SKLearn, Mallet), enables interactive usage (Jupyter Notebooks), and generally follows object oriented principles to make it easy to tie in additional libraries or storage technologies.

Recognizing Counterfactual Thinking in Social Media Texts
Youngseo Son | Anneke Buffone | Joe Raso | Allegra Larche | Anthony Janocko | Kevin Zembroski | H Andrew Schwartz | Lyle Ungar
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Counterfactual statements, describing events that did not occur and their consequents, have been studied in areas including problem-solving, affect management, and behavior regulation. People with more counterfactual thinking tend to perceive life events as more personally meaningful. Nevertheless, counterfactuals have not been studied in computational linguistics. We create a counterfactual tweet dataset and explore approaches for detecting counterfactuals using rule-based and supervised statistical approaches. A combined rule-based and statistical approach yielded the best results (F1 = 0.77) outperforming either approach used alone.

Assessing Objective Recommendation Quality through Political Forecasting
H. Andrew Schwartz | Masoud Rouhizadeh | Michael Bishop | Philip Tetlock | Barbara Mellers | Lyle Ungar
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Recommendations are often rated for their subjective quality, but few researchers have studied comment quality in terms of objective utility. We explore recommendation quality assessment with respect to both subjective (i.e. users’ ratings) and objective (i.e., did it influence? did it improve decisions?) metrics in a massive online geopolitical forecasting system, ultimately comparing linguistic characteristics of each quality metric. Using a variety of features, we predict all types of quality with better accuracy than the simple yet strong baseline of comment length. Looking at the most predictive content illustrates rater biases; for example, forecasters are subjectively biased in favor of comments mentioning business transactions or dealings as well as material things, even though such comments do not indeed prove any more useful objectively. Additionally, more complex sentence constructions, as evidenced by subordinate conjunctions, are characteristic of comments leading to objective improvements in forecasting.

2016

Using Syntactic and Semantic Context to Explore Psychodemographic Differences in Self-reference
Masoud Rouhizadeh | Lyle Ungar | Anneke Buffone | H Andrew Schwartz
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Does ‘well-being’ translate on Twitter?
Laura Smith | Salvatore Giorgi | Rishi Solanki | Johannes Eichstaedt | H. Andrew Schwartz | Muhammad Abdul-Mageed | Anneke Buffone | Lyle Ungar
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Modelling Valence and Arousal in Facebook posts
Daniel Preoţiuc-Pietro | H. Andrew Schwartz | Gregory Park | Johannes Eichstaedt | Margaret Kern | Lyle Ungar | Elisabeth Shulman
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

The Clinical Panel: Leveraging Psychological Expertise During NLP Research
Glen Coppersmith | Kristy Hollingshead | H. Andrew Schwartz | Molly Ireland | Rebecca Resnik | Kate Loveys | April Foreman | Loring Ingraham
Proceedings of the First Workshop on NLP and Computational Social Science

2015

The role of personality, age, and gender in tweeting about mental illness
Daniel Preoţiuc-Pietro | Johannes Eichstaedt | Gregory Park | Maarten Sap | Laura Smith | Victoria Tobolsky | H. Andrew Schwartz | Lyle Ungar
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

Extracting Human Temporal Orientation from Facebook Language
H. Andrew Schwartz | Gregory Park | Maarten Sap | Evan Weingarten | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Jonah Berger | Martin Seligman | Lyle Ungar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task
Daniel Preoţiuc-Pietro | Maarten Sap | H. Andrew Schwartz | Lyle Ungar
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

2014

Towards Assessing Changes in Degree of Depression through Facebook
H. Andrew Schwartz | Johannes Eichstaedt | Margaret L. Kern | Gregory Park | Maarten Sap | David Stillwell | Michal Kosinski | Lyle Ungar
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

Developing Age and Gender Predictive Lexica over Social Media
Maarten Sap | Gregory Park | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Lyle Ungar | Hansen Andrew Schwartz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach
Hansen Andrew Schwartz | Johannes Eichstaedt | Eduardo Blanco | Lukasz Dziurzynski | Margaret L. Kern | Stephanie Ramones | Martin Seligman | Lyle Ungar
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

Improving Supervised Sense Disambiguation with Web-Scale Selectors
H. Andrew Schwartz | Fernando Gomez | Lyle Ungar
Proceedings of COLING 2012

New Insights from Coarse Word Sense Disambiguation in the Crowd
Adam Kapelner | Krishna Kaliannan | H. Andrew Schwartz | Lyle Ungar | Dean Foster
Proceedings of COLING 2012: Posters

Penn: Using Word Similarities to better Estimate Sentence Similarity
Sneha Jha | Hansen A. Schwartz | Lyle Ungar
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2010

UCF-WS: Domain Word Sense Disambiguation Using Web Selectors
Hansen A. Schwartz | Fernando Gomez
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

Acquiring Applicable Common Sense Knowledge from the Web
Hansen A. Schwartz | Fernando Gomez
Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics

Using Web Selectors for the Disambiguation of All Words
Hansen A. Schwartz | Fernando Gomez
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

2008

Acquiring Knowledge from the Web to be used as Selectors for Noun Sense Disambiguation
Hansen A. Schwartz | Fernando Gomez
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

Co-authors

Niranjan Balasubramanian 11

Adithya V. Ganesan 11

Matthew Matero 10

Anneke Buffone 6

Syeda Mahwish 6

Fernando Gomez 5

Margaret Kern 5

Veronica Lynn 5

Sharath Chandra Guntuku 4

Siddharth Mangalik 4

August Håkan Nilsson 4

Daniel Preoţiuc-Pietro 4

Mohammadzaman Zamani 4

Michal Kosinski 3

Vivek Kulkarni 3

Masoud Rouhizadeh 3

David Stillwell 3

Pranav Chitale 2

Scott Feltman 2

Allison Lahnala 2

Yash Kumar Lal 2

Christian Luhmann 2

Akshay Raghavan 2

Daniel Rieman 2

Camilo Ruggero 2

Martin Seligman 2

Sverker Sikström 2

Khushboo Singh 2

Muhammad Abdul-Mageed 1

Pegah Alipoormolabashi 1

Fatemeh Almodaresi 1

Ritwik Banerjee 1

Michael Bishop 1

Eduardo Blanco 1

Ana-Maria Bucur 1

Julia Buffolino 1

Rebecca Astrid Böhme 1

Young Min Cho 1

Jeremy D. W. Clifton 1

Sean Clouston 1

Glen Coppersmith 1

Patrick Crutchley 1

Lukasz Dziurzynski 1

Michael Ferdman 1

April Foreman 1

Alissa Goodman 1

Kristy Hollingshead 1

Loring Ingraham 1

Molly Ireland 1

Shraddhan Jain 1

Anthony Janocko 1

Krishna Kaliannan 1

Adam Kapelner 1

Nicholas Kerry 1

Kevin Lanning 1

Allegra Larche 1

Parth Limbachiya 1

Benjamin Luft 1

Christian C. Luhmann 1

Dave M. Markowitz 1

Barbara Mellers 1

Mariam Marlen Mirström 1

Kate Niederhoffer 1

Stephanie Ramones 1

Aravind Reddy Ravula 1

Philip Resnik 1

Rebecca Resnik 1

Whitney Ringwald 1

Richard N. Rosenthal 1

Richard Rosenthal 1

Neville Ryant 1

Roshan Santhosh 1

Deven Santosh Shah 1

Ashish Sharma 1

Elisabeth Shulman 1

Shreyashee Sinha 1

Rishi Solanki 1

Shashanka Subrahmanya 1

Anand Syamkumar 1

Philip Tetlock 1

Victoria Tobolsky 1

Isabella Vallejo 1

Sujeeth Vankudari 1

Adithya Virinchipuram Ganesan 1

Evan Weingarten 1

Charles Welch 1

Abigail Wheeler 1

Mohsen Zakeri 1

Mohammad Zamani 1

Kevin Zembroski 1

Venues