Joshua Garland

2024

2023

pdf bib abs
How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts
Tharindu Kumarage | Paras Sheth | Raha Moraffah | Joshua Garland | Huan Liu
Findings of the Association for Computational Linguistics: EMNLP 2023

In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing “human-like” text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.

pdf bib
J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News
Tharindu Kumarage | Amrita Bhattacharjee | Djordje Padejski | Kristy Roschke | Dan Gillmor | Scott Ruston | Huan Liu | Joshua Garland
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2020

pdf bib abs
Countering hate on social media: Large scale classification of hate and counter speech
Joshua Garland | Keyan Ghazi-Zahedi | Jean-Gabriel Young | Laurent Hébert-Dufresne | Mirta Galesic
Proceedings of the Fourth Workshop on Online Abuse and Harms

Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage with hate speech to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we use a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We use an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieves macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97—accuracy in line and even exceeding the state of the art. We then use the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.

Co-authors

Venues

Fix data