Derek Ruths


2024

pdf bib
Story Morals: Surfacing value-driven narrative schemas using large language models
David G Hobson | Haiqi Zhou | Derek Ruths | Andrew Piper
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Stories are not only designed to entertain but encode lessons reflecting their authors’ beliefs about the world. In this paper, we propose a new task of narrative schema labelling based on the concept of “story morals” to identify the values and lessons conveyed in stories. Using large language models (LLMs) such as GPT-4, we develop methods to automatically extract and validate story morals across a diverse set of narrative genres, including folktales, novels, movies and TV, personal stories from social media and the news. Our approach involves a multi-step prompting sequence to derive morals and validate them through both automated metrics and human assessments. The findings suggest that LLMs can effectively approximate human story moral interpretations and offer a new avenue for computational narrative understanding. By clustering the extracted morals on a sample dataset of folktales from around the world, we highlight the commonalities and distinctiveness of narrative values, providing preliminary insights into the distribution of values across cultures. This work opens up new possibilities for studying narrative schemas and their role in shaping human beliefs and behaviors.

pdf bib
Large Scale Narrative Messaging around Climate Change: A Cross-Cultural Comparison
Haiqi Zhou | David Hobson | Derek Ruths | Andrew Piper
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

In this study, we explore the use of Large Language Models (LLMs) such as GPT-4 to extract and analyze the latent narrative messaging in climate change-related news articles from North American and Chinese media. By defining “narrative messaging” as the intrinsic moral or lesson of a story, we apply our model to a dataset of approximately 15,000 news articles in English and Mandarin, categorized by climate-related topics and ideological groupings. Our findings reveal distinct differences in the narrative values emphasized by different cultural and ideological contexts, with North American sources often focusing on individualistic and crisis-driven themes, while Chinese sources emphasize developmental and cooperative narratives. This work demonstrates the potential of LLMs in understanding and influencing climate communication, offering new insights into the collective belief systems that shape public discourse on climate change across different cultures.

pdf bib
The Social Lives of Literary Characters: Combining citizen science and language models to understand narrative social networks
Andrew Piper | Michael Xu | Derek Ruths
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

Characters and their interactions are central to the fabric of narratives, playing a crucial role in developing readers’ social cognition. In this paper, we introduce a novel annotation framework that distinguishes between five types of character interactions, including bilateral and unilateral classifications. Leveraging the crowd-sourcing framework of citizen science, we collect a large dataset of manual annotations (N=13,395). Using this data, we explore how genre and audience factors influence social network structures in a sample of contemporary books. Our findings demonstrate that fictional narratives tend to favor more embodied interactions and exhibit denser and less modular social networks. Our work not only enhances the understanding of narrative social networks but also showcases the potential of integrating citizen science with NLP methodologies for large-scale narrative analysis.

pdf bib
Multi-Target User Stance Discovery on Reddit
Benjamin Steel | Derek Ruths
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

We consider how to credibly and reliably assess the opinions of individuals using their social media posts. To this end, this paper makes three contributions. First, we assemble a workflow and approach to applying modern natural language processing (NLP) methods to multi-target user stance detection in the wild. Second, we establish why the multi-target modeling of user stance is qualitatively more complicated than uni-target user-stance detection. Finally, we validate our method by showing how multi-dimensional measurement of user opinions not only reproduces known opinion polling results, but also enables the study of opinion dynamics at high levels of temporal and semantic resolution.

2022

pdf bib
Enriching Abusive Language Detection with Community Context
Haji Mohammad Saleem | Jana Kurrek | Derek Ruths
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Uses of pejorative expressions can be benign or actively empowering. When models for abuse detection misclassify these expressions as derogatory, they inadvertently censor productive conversations held by marginalized groups. One way to engage with non-dominant perspectives is to add context around conversations. Previous research has leveraged user- and thread-level features, but it often neglects the spaces within which productive conversations take place. Our paper highlights how community context can improve classification outcomes in abusive language detection. We make two main contributions to this end. First, we demonstrate that online communities cluster by the nature of their support towards victims of abuse. Second, we establish how community context improves accuracy and reduces the false positive rates of state-of-the-art abusive language classifiers. These findings suggest a promising direction for context-aware models in abusive language research.

2021

pdf bib
“Are you kidding me?”: Detecting Unpalatable Questions on Reddit
Sunyam Bagga | Andrew Piper | Derek Ruths
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Abusive language in online discourse negatively affects a large number of social media users. Many computational methods have been proposed to address this issue of online abuse. The existing work, however, tends to focus on detecting the more explicit forms of abuse leaving the subtler forms of abuse largely untouched. Our work addresses this gap by making three core contributions. First, inspired by the theory of impoliteness, we propose a novel task of detecting a subtler form of abuse, namely unpalatable questions. Second, we publish a context-aware dataset for the task using data from a diverse set of Reddit communities. Third, we implement a wide array of learning models and also investigate the benefits of incorporating conversational context into computational models. Our results show that modeling subtle abuse is feasible but difficult due to the language involved being highly nuanced and context-sensitive. We hope that future research in the field will address such subtle forms of abuse since their harm currently passes unnoticed through existing detection systems.

2020

pdf bib
Towards a Comprehensive Taxonomy and Large-Scale Annotated Corpus for Online Slur Usage
Jana Kurrek | Haji Mohammad Saleem | Derek Ruths
Proceedings of the Fourth Workshop on Online Abuse and Harms

Abusive language classifiers have been shown to exhibit bias against women and racial minorities. Since these models are trained on data that is collected using keywords, they tend to exhibit a high sensitivity towards pejoratives. As a result, comments written by victims of abuse are frequently labelled as hateful, even if they discuss or reclaim slurs. Any attempt to address bias in keyword-based corpora requires a better understanding of pejorative language, as well as an equitable representation of targeted users in data collection. We make two main contributions to this end. First, we provide an annotation guide that outlines 4 main categories of online slur usage, which we further divide into a total of 12 sub-categories. Second, we present a publicly available corpus based on our taxonomy, with 39.8k human annotated comments extracted from Reddit. This corpus was annotated by a diverse cohort of coders, with Shannon equitability indices of 0.90, 0.92, and 0.87 across sexuality, ethnicity, and gender. Taken together, our taxonomy and corpus allow researchers to evaluate classifiers on a wider range of speech containing slurs.

2018

pdf bib
Sentiment Analysis: It’s Complicated!
Kian Kenyon-Dean | Eisha Ahmed | Scott Fujimoto | Jeremy Georges-Filteau | Christopher Glasz | Barleen Kaur | Auguste Lalande | Shruti Bhanderi | Robert Belfer | Nirmal Kanagasabai | Roman Sarrazingendron | Rohit Verma | Derek Ruths
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Sentiment analysis is used as a proxy to measure human emotion, where the objective is to categorize text according to some predefined notion of sentiment. Sentiment analysis datasets are typically constructed with gold-standard sentiment labels, assigned based on the results of manual annotations. When working with such annotations, it is common for dataset constructors to discard “noisy” or “controversial” data where there is significant disagreement on the proper label. In datasets constructed for the purpose of Twitter sentiment analysis (TSA), these controversial examples can compose over 30% of the originally annotated data. We argue that the removal of such data is a problematic trend because, when performing real-time sentiment classification of short-text, an automated system cannot know a priori which samples would fall into this category of disputed sentiment. We therefore propose the notion of a “complicated” class of sentiment to categorize such text, and argue that its inclusion in the short-text sentiment analysis framework will improve the quality of automated sentiment analysis systems as they are implemented in real-world settings. We motivate this argument by building and analyzing a new publicly available TSA dataset of over 7,000 tweets annotated with 5x coverage, named MTSA. Our analysis of classifier performance over our dataset offers insights into sentiment analysis dataset and model design, how current techniques would perform in the real world, and how researchers should handle difficult data.

pdf bib
An Attribution Relations Corpus for Political News
Edward Newell | Drew Margolin | Derek Ruths
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Hierarchical Neural Attention-based Text Classifier
Koustuv Sinha | Yue Dong | Jackie Chi Kit Cheung | Derek Ruths
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep neural networks have been displaying superior performance over traditional supervised classifiers in text classification. They learn to extract useful features automatically when sufficient amount of data is presented. However, along with the growth in the number of documents comes the increase in the number of categories, which often results in poor performance of the multiclass classifiers. In this work, we use external knowledge in the form of topic category taxonomies to aide the classification by introducing a deep hierarchical neural attention-based classifier. Our model performs better than or comparable to state-of-the-art hierarchical models at significantly lower computational cost while maintaining high interpretability.

2017

pdf bib
Assessing the Verifiability of Attributions in News Text
Edward Newell | Ariane Schang | Drew Margolin | Derek Ruths
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

When reporting the news, journalists rely on the statements of stakeholders, experts, and officials. The attribution of such a statement is verifiable if its fidelity to the source can be confirmed or denied. In this paper, we develop a new NLP task: determining the verifiability of an attribution based on linguistic cues. We operationalize the notion of verifiability as a score between 0 and 1 using human judgments in a comparison-based approach. Using crowdsourcing, we create a dataset of verifiability-scored attributions, and demonstrate a model that achieves an RMSE of 0.057 and Spearman’s rank correlation of 0.95 to human-generated scores. We discuss the application of this technique to the analysis of mass media.

pdf bib
Vectors for Counterspeech on Twitter
Lucas Wright | Derek Ruths | Kelly P Dillon | Haji Mohammad Saleem | Susan Benesch
Proceedings of the First Workshop on Abusive Language Online

A study of conversations on Twitter found that some arguments between strangers led to favorable change in discourse and even in attitudes. The authors propose that such exchanges can be usefully distinguished according to whether individuals or groups take part on each side, since the opportunity for a constructive exchange of views seems to vary accordingly.

2016

pdf bib
Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
Hardik Vala | Stefan Dimitrov | David Jurgens | Andrew Piper | Derek Ruths
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Characters form the focus of various studies of literary works, including social network analysis, archetype induction, and plot comparison. The recent rise in the computational modelling of literary works has produced a proportional rise in the demand for character-annotated literary corpora. However, automatically identifying characters is an open problem and there is low availability of literary texts with manually labelled characters. To address the latter problem, this work presents three contributions: (1) a comprehensive scheme for manually resolving mentions to characters in texts. (2) A novel collaborative annotation tool, CHARLES (CHAracter Resolution Label-Entry System) for character annotation and similiar cross-document tagging tasks. (3) The character annotations resulting from a pilot study on the novel Pride and Prejudice, demonstrating the scheme and tool facilitate the efficient production of high-quality annotations. We expect this work to motivate the further production of annotated literary corpora to help meet the demand of the community.

pdf bib
The More Antecedents, the Merrier: Resolving Multi-Antecedent Anaphors
Hardik Vala | Andrew Piper | Derek Ruths
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts
Hardik Vala | David Jurgens | Andrew Piper | Derek Ruths
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Twitter Users #CodeSwitch Hashtags! #MoltoImportante #wow
David Jurgens | Stefan Dimitrov | Derek Ruths
Proceedings of the First Workshop on Computational Approaches to Code Switching

2013

pdf bib
Gender Inference of Twitter Users in Non-English Contexts
Morgane Ciot | Morgan Sonderegger | Derek Ruths
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing