Zeerak Talat - ACL Anthology

Zeerak Talat

Also published as: Zeerak Waseem

Other people with similar names: Zeerak Talat

Unverified author pages with similar names: Zeerak Talat

2025

Exploring the Limitations of Detecting Machine-Generated Text
Jad Doughman | Osama Mohammed Afzal | Hawau Olamide Toyin | Shady Shehata | Preslav Nakov | Zeerak Talat
Proceedings of the 31st International Conference on Computational Linguistics

Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Such work often presents high-performing detectors. However, humans and machines can produce text in different styles and domains, yet the the performance impact of such on machine generated text detection systems remains unclear. In this paper, we audit the classification performance for detecting machine-generated text by evaluating on texts with varying writing styles. We find that classifiers are highly sensitive to stylistic changes and differences in text complexity, and in some cases degrade entirely to random classifiers. We further find that detection systems are particularly susceptible to misclassify easy-to-read texts while they have high performance for complex texts, leading to concerns about the reliability of detection systems. We recommend that future work attends to stylistic factors and reading difficulty levels of human-written and machine-generated text.

Pathways to Radicalisation: On Research for Online Radicalisation in Natural Language Processing and Machine Learning
Zeerak Talat | Michael Sejr Schlichtkrull | Pranava Madhyastha | Christine De Kock
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Online communities play an integral part in communication for communication across the globe. Online communities that are known for extremist content. As a field of surveillance technologies, NLP and other ML fields hold particular promise for monitoring extremist communities that may turn violent.Such communities make use of a wide variety of modalities of communication, including textual posts on specialised fora, memes, videos, and podcasts. Furthermore, such communities undergo rapid linguistic evolution, thus presenting a challenge to machine learning technologies that quickly diverge from the data that are used. In this position, we argue that radicalisation is a nascent area for which machine learning is particularly apt. However, in addressing radicalisation research it is important that avoids falling into the temptation of focusing on prediction. We argue that such communities present a particular avenue for addressing key concerns with machine learning technologies: (1) temporal misalignment of models and (2) aligning and linking content across modalities.

Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
Masahiro Kaneko | Zeerak Talat | Timothy Baldwin
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Iterative jailbreak methods that repeatedly rewrite and input prompts into large language models (LLMs) to induce harmful outputs—using the model’s previous responses to guide each new iteration—have been found to be a highly effective attack strategy. Despite being an effective attack strategy against LLMs and their safety mechanisms, existing defenses do not proactively disrupt this dynamic trial-and-error cycle. In this study, we propose a novel framework that dynamically updates its defense strategy through online learning in response to each new prompt from iterative jailbreak methods. Leveraging the distinctions between harmful jailbreak-generated prompts and typical harmless prompts, we introduce a reinforcement learning-based approach that optimizes prompts to ensure appropriate responses for harmless tasks while explicitly rejecting harmful prompts. Additionally, to curb overfitting to the narrow band of partial input rewrites explored during an attack, we introduce Past‐Direction Gradient Damping (PDGD). Experiments conducted on three LLMs show that our approach significantly outperforms five existing defense methods against five iterative jailbreak methods. Moreover, our results indicate that our prompt optimization strategy simultaneously enhances response quality for harmless tasks.

The Only Way is Ethics: A Guide to Ethical Research with Large Language Models
Eddie L. Ungless | Nikolas Vitsakis | Zeerak Talat | James Garforth | Bjorn Ross | Arno Onken | Atoosa Kasirzadeh | Alexandra Birch
Proceedings of the 31st International Conference on Computational Linguistics

There is a significant body of work looking at the ethical considerations of large language models (LLMs): critiquing tools to measure performance and harms; proposing toolkits to aid in ideation; discussing the risks to workers; considering legislation around privacy and security etc. As yet there is no work that integrates these resources into a single practical guide that focuses on LLMs; we attempt this ambitious goal. We introduce LLM Ethics Whitepaper, which we provide as an open and living resource for NLP practitioners, and those tasked with evaluating the ethical implications of others’ work. Our goal is to translate ethics literature into concrete recommendations for computer scientists. LLM Ethics Whitepaper distils a thorough literature review into clear Do’s and Don’ts, which we present also in this paper. We likewise identify useful toolkits to support ethical work. We refer the interested reader to the full LLM Ethics Whitepaper, which provides a succinct discussion of ethical considerations at each stage in a project lifecycle, as well as citations for the hundreds of papers from which we drew our recommendations. The present paper can be thought of as a pocket guide to conducting ethical research with LLMs.

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell | Giuseppe Attanasio | Ioana Baldini | Miruna Clinciu | Jordan Clive | Pieter Delobelle | Manan Dey | Sil Hamilton | Timm Dill | Jad Doughman | Ritam Dutt | Avijit Ghosh | Jessica Zosa Forde | Carolin Holtermann | Lucie-Aimée Kaffee | Tanmay Laud | Anne Lauscher | Roberto L Lopez-Davila | Maraim Masoud | Nikita Nangia | Anaelia Ovalle | Giada Pistilli | Dragomir Radev | Beatrice Savoldi | Vipul Raheja | Jeremy Qin | Esther Ploeger | Arjun Subramonian | Kaustubh Dhole | Kaiser Sun | Amirbek Djanibekov | Jonibek Mansurov | Kayo Yin | Emilio Villa Cueva | Sagnik Mukherjee | Jerry Huang | Xudong Shen | Jay Gala | Hamdan Al-Ali | Tair Djanibekov | Nurdaulet Mukhituly | Shangrui Nie | Shanya Sharma | Karolina Stanczak | Eliza Szczechla | Tiago Timponi Torrent | Deepak Tunuguntla | Marcelo Viridiano | Oskar Van Der Wal | Adina Yakefu | Aurélie Névéol | Mike Zhang | Sydney Zink | Zeerak Talat
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.

Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Agostina Calabrese | Christine de Kock | Debora Nozza | Flor Miriam Plaza-del-Arco | Zeerak Talat | Francielle Vargas
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

2024

The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels
Eve Fleisig | Su Lin Blodgett | Dan Klein | Zeerak Talat
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement–some challenged by perspectivist approaches, and some that remain to be addressed–as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.

Subjective Isms? On the Danger of Conflating Hate and Offence in Abusive Language Detection
Amanda Cercas Curry | Gavin Abercrombie | Zeerak Talat
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

Natural language processing research has begun to embrace the notion of annotator subjectivity, motivated by variations in labelling. This approach understands each annotator’s view as valid, which can be highly suitable for tasks that embed subjectivity, e.g., sentiment analysis. However, this construction may be inappropriate for tasks such as hate speech detection, as it affords equal validity to all positions on e.g., sexism or racism. We argue that the conflation of hate and offence can invalidate findings on hate speech, and call for future work to be situated in theory, disentangling hate from its orthogonal concept, offence.

Classist Tools: Social Class Correlates with Performance in NLP
Amanda Cercas Curry | Giuseppe Attanasio | Zeerak Talat | Dirk Hovy
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The field of sociolinguistics has studied factors affecting language use for the last century. Labov (1964) and Bernstein (1960) showed that socioeconomic class strongly influences our accents, syntax and lexicon. However, despite growing concerns surrounding fairness and bias in Natural Language Processing (NLP), there is a dearth of studies delving into the effects it may have on NLP systems. We show empirically that NLP systems’ performance is affected by speakers’ SES, potentially disadvantaging less-privileged socioeconomic groups. We annotate a corpus of 95K utterances from movies with social class, ethnicity and geographical language variety and measure the performance of NLP systems on three tasks: language modelling, automatic speech recognition, and grammar error correction. We find significant performance disparities that can be attributed to socioeconomic status as well as ethnicity and geographical differences. With NLP technologies becoming ever more ubiquitous and quotidian, they must accommodate all language varieties to avoid disadvantaging already marginalised groups. We argue for the inclusion of socioeconomic class in future language technologies.

Impoverished Language Technology: The Lack of (Social) Class in NLP
Amanda Cercas Curry | Zeerak Talat | Dirk Hovy
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Since Labov’s foundational 1964 work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception. Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology. While age and gender are well covered, Labov’s initial target, socio-economic class, is largely absent. We survey the existing Natural Language Processing (NLP) literature and find that only 20 papers even mention socio-economic status. However, the majority of those papers do not engage with class beyond collecting information of annotator-demographics. Given this research lacuna, we provide a definition of class that can be operationalised by NLP researchers, and argue for including socio-economic class in future language technologies.

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto | Tilman Beck | Zeerak Talat | Iryna Gurevych | Timothy Baldwin
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT–3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages.

Understanding “Democratization” in NLP and ML Research
Arjun Subramonian | Vagrant Gautam | Dietrich Klakow | Zeerak Talat
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent improvements in natural language processing (NLP) and machine learning (ML) and increased mainstream adoption have led to researchers frequently discussing the “democratization” of artificial intelligence. In this paper, we seek to clarify how democratization is understood in NLP and ML publications, through large-scale mixed-methods analyses of papers using the keyword “democra*” published in NLP and adjacent venues. We find that democratization is most frequently used to convey (ease of) access to or use of technologies, without meaningfully engaging with theories of democratization, while research using other invocations of “democra*” tends to be grounded in theories of deliberation and debate. Based on our findings, we call for researchers to enrich their use of the term democratization with appropriate theory, towards democratic technologies beyond superficial access.

Metrics for What, Metrics for Whom: Assessing Actionability of Bias Evaluation Metrics in NLP
Pieter Delobelle | Giuseppe Attanasio | Debora Nozza | Su Lin Blodgett | Zeerak Talat
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

This paper introduces the concept of actionability in the context of bias measures in natural language processing (NLP). We define actionability as the degree to which a measure’s results enable informed action and propose a set of desiderata for assessing it. Building on existing frameworks such as measurement modeling, we argue that actionability is a crucial aspect of bias measures that has been largely overlooked in the literature.We conduct a comprehensive review of 146 papers proposing bias measures in NLP, examining whether and how they provide the information required for actionable results. Our findings reveal that many key elements of actionability, including a measure’s intended use and reliability assessment, are often unclear or entirely absent.This study highlights a significant gap in the current approach to developing and reporting bias measures in NLP. We argue that this lack of clarity may impede the effective implementation and utilization of these measures. To address this issue, we offer recommendations for more comprehensive and actionable metric development and reporting practices in NLP bias research.

Contemporary large-scale data collection efforts have prioritized the amount of data collected to improve large language models (LLM). This quantitative approach has resulted in concerns for the rights of data subjects represented in data collections. This concern is exacerbated by a lack of documentation and analysis tools, making it difficult to interrogate these collections. Mindful of these pitfalls, we present a methodology for documentation-first, human-centered data collection. We apply this approach in an effort to train a multilingual LLM. We identify a geographically diverse set of target language groups (Arabic varieties, Basque, Chinese varieties, Catalan, English, French, Indic languages, Indonesian, Niger-Congo languages, Portuguese, Spanish, and Vietnamese, as well as programming languages) for which to collect metadata on potential data sources. We structure this effort by developing an online catalogue in English as a tool for gathering metadata through public hackathons. We present our tool and analyses of the resulting resource metadata, including distributions over languages, regions, and resource types, and discuss our lessons learned.

Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)
Yi-Ling Chung | Zeerak Talat | Debora Nozza | Flor Miriam Plaza-del-Arco | Paul Röttger | Aida Mostafazadeh Davani | Agostina Calabrese
Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024)

2023

A Federated Approach for Hate Speech Detection
Jay Gala | Deep Gandhi | Jash Mehta | Zeerak Talat
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Hate speech detection has been the subject of high research attention, due to the scale of content created on social media. In spite of the attention and the sensitive nature of the task, privacy preservation in hate speech detection has remained under-studied. The majority of research has focused on centralised machine learning infrastructures which risk leaking data. In this paper, we show that using federated machine learning can help address privacy the concerns that are inherent to hate speech detection while obtaining up to 6.81% improvement in terms of F1-score.

Mirages. On Anthropomorphism in Dialogue Systems
Gavin Abercrombie | Amanda Cercas Curry | Tanvi Dinkar | Verena Rieser | Zeerak Talat
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Automated dialogue or conversational systems are anthropomorphised by developers and personified by users. While a degree of anthropomorphism is inevitable, conscious and unconscious design choices can guide users to personify them to varying degrees. Encouraging users to relate to automated systems as if they were human can lead to transparency and trust issues, and high risk scenarios caused by over-reliance on their outputs. As a result, natural language processing researchers have investigated the factors that induce personification and develop resources to mitigate such effects. However, these efforts are fragmented, and many aspects of anthropomorphism have yet to be explored. In this paper, we discuss the linguistic factors that contribute to the anthropomorphism of dialogue systems and the harms that can arise thereof, including reinforcing gender stereotypes and conceptions of acceptable language. We recommend that future efforts towards developing dialogue systems take particular care in their design, development, release, and description; and attend to the many linguistic cues that can elicit personification by users.

Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing
Lucie-Aimée Kaffee | Arnav Arora | Zeerak Talat | Isabelle Augenstein
Findings of the Association for Computational Linguistics: EMNLP 2023

Dual use, the intentional, harmful reuse of technology and scientific artefacts, is an ill-defined problem within the context of Natural Language Processing (NLP). As large language models (LLMs) have advanced in their capabilities and become more accessible, the risk of their intentional misuse becomes more prevalent. To prevent such intentional malicious use, it is necessary for NLP researchers and practitioners to understand and mitigate the risks of their research. Hence, we present an NLP-specific definition of dual use informed by researchers and practitioners in the field. Further, we propose a checklist focusing on dual-use in NLP, that can be integrated into existing conference ethics-frameworks. The definition and checklist are created based on a survey of NLP researchers and practitioners.

2022

A Federated Approach to Predicting Emojis in Hindi Tweets
Deep Gandhi | Jash Mehta | Nirali Parekh | Karan Waghela | Lynette D’Mello | Zeerak Talat
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

The use of emojis affords a visual modality to, often private, textual communication.The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis.Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches.However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage.In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people’s data.We introduce a new dataset of 118k tweets (augmented from 25k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.

Directions for NLP Practices Applied to Online Hate Speech Detection
Paula Fortuna | Monica Dominguez | Leo Wanner | Zeerak Talat
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Addressing hate speech in online spaces has been conceptualized as a classification task that uses Natural Language Processing (NLP) techniques. Through this conceptualization, the hate speech detection task has relied on common conventions and practices from NLP. For instance, inter-annotator agreement is conceptualized as a way to measure dataset quality and certain metrics and benchmarks are used to assure model generalization. However, hate speech is a deeply complex and situated concept that eludes such static and disembodied practices. In this position paper, we critically reflect on these methodologies for hate speech detection, we argue that many conventions in NLP are poorly suited for the problem and encourage researchers to develop methods that are more appropriate for the task.

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Paul Röttger | Haitham Seelawi | Debora Nozza | Zeerak Talat | Bertie Vidgen
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced functional tests for hate speech detection models. However, these tests currently only exist for English-language content, which means that they cannot support the development of more effective models in other languages spoken by billions across the world. To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models. MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset. To illustrate MHC’s utility, we train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.

On the Machine Learning of Ethical Judgments from Natural Language
Zeerak Talat | Hagen Blix | Josef Valvoda | Maya Indira Ganesh | Ryan Cotterell | Adina Williams
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Ethics is one of the longest standing intellectual endeavors of humanity. In recent years, the fields of AI and NLP have attempted to address issues of harmful outcomes in machine learning systems that are made to interface with humans. One recent approach in this vein is the construction of NLP morality models that can take in arbitrary text and output a moral judgment about the situation described. In this work, we offer a critique of such NLP methods for automating ethical decision-making. Through an audit of recent work on computational approaches for predicting morality, we examine the broader issues that arise from such efforts. We conclude with a discussion of how machine ethics could usefully proceed in NLP, by focusing on current and near-future uses of technology, in a way that centers around transparency, democratic values, and allows for straightforward accountability.

Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases for large multilingual language models, we situate the discussion of bias evaluation within a wider context of social scientific research with computational work. We highlight three dimensions of developing multilingual bias evaluation frameworks: (1) increasing transparency through documentation, (2) expanding targets of bias beyond gender, and (3) addressing cultural differences that exist between languages. We further discuss the power dynamics and consequences of training large language models and recommend that researchers remain cognizant of the ramifications of developing such technologies.

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Kanika Narang | Aida Mostafazadeh Davani | Lambert Mathias | Bertie Vidgen | Zeerak Talat
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

2021

Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection
Lambert Mathias | Shaoliang Nie | Aida Mostafazadeh Davani | Douwe Kiela | Vinodkumar Prabhakaran | Bertie Vidgen | Zeerak Waseem
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

We present the results and main findings of the shared task at WOAH 5 on hateful memes detection. The task include two subtasks relating to distinct challenges in the fine-grained detection of hateful memes: (1) the protected category attacked by the meme and (2) the attack type. 3 teams submitted system description papers. This shared task builds on the hateful memes detection task created by Facebook AI Research in 2020.

HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger | Bertie Vidgen | Dong Nguyen | Zeerak Waseem | Helen Margetts | Janet Pierrehumbert
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck’s utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.

“Hold on honey, men at work”: A semi-supervised approach to detecting sexism in sitcoms
Smriti Singh | Tanvi Anand | Arijit Ghosh Chowdhury | Zeerak Waseem
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

Television shows play an important role inpropagating societal norms. Owing to the popularity of the situational comedy (sitcom) genre, it contributes significantly to the over-all development of society. In an effort to analyze the content of television shows belong-ing to this genre, we present a dataset of dialogue turns from popular sitcoms annotated for the presence of sexist remarks. We train a text classification model to detect sexism using domain adaptive learning. We apply the model to our dataset to analyze the evolution of sexist content over the years. We propose a domain-specific semi-supervised architecture for the aforementioned detection of sexism. Through extensive experiments, we show that our model often yields better classification performance over generic deep learn-ing based sentence classification that does not employ domain-specific training. We find that while sexism decreases over time on average,the proportion of sexist dialogue for the most sexist sitcom actually increases. A quantitative analysis along with a detailed error analysis presents the case for our proposed methodology

Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)
Aida Mostafazadeh Davani | Douwe Kiela | Mathias Lambert | Bertie Vidgen | Vinodkumar Prabhakaran | Zeerak Waseem
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

A Survey of Race, Racism, and Anti-Racism in NLP
Anjalie Field | Su Lin Blodgett | Zeerak Waseem | Yulia Tsvetkov
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Despite inextricable ties between race and language, little work has considered race in NLP research and development. In this work, we survey 79 papers from the ACL anthology that mention race. These papers reveal various types of race-related bias in all stages of NLP model development, highlighting the need for proactive consideration of how NLP systems can uphold racial hierarchies. However, persistent gaps in research on race and NLP remain: race has been siloed as a niche topic and remains ignored in many NLP tasks; most work operationalizes race as a fixed single-dimensional variable with a ground-truth label, which risks reinforcing differences produced by historical racism; and the voices of historically marginalized people are nearly absent in NLP literature. By identifying where and how NLP literature has and has not considered race, especially in comparison to related fields, our work calls for inclusion and racial justice in NLP research practices.

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen | Tristan Thrush | Zeerak Waseem | Douwe Kiela
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a human-and-model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models. We provide a new dataset of 40,000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation. It includes 15,000 challenging perturbations and each hateful entry has fine-grained labels for the type and target of hate. Hateful entries make up 54% of the dataset, which is substantially higher than comparable datasets. We show that model performance is substantially improved using this approach. Models trained on later rounds of data collection perform better on test sets and are harder for annotators to trick. They also have better performance on HateCheck, a suite of functional tests for online hate detection. We provide the code, dataset and annotation guidelines for other researchers to use.

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.

2020

Proceedings of the Fourth Workshop on Online Abuse and Harms
Seyi Akiwowo | Bertie Vidgen | Vinodkumar Prabhakaran | Zeerak Waseem
Proceedings of the Fourth Workshop on Online Abuse and Harms

Online Abuse and Human Rights: WOAH Satellite Session at RightsCon 2020
Vinodkumar Prabhakaran | Zeerak Waseem | Seyi Akiwowo | Bertie Vidgen
Proceedings of the Fourth Workshop on Online Abuse and Harms

In 2020 The Workshop on Online Abuse and Harms (WOAH) held a satellite panel at RightsCons 2020, an international human rights conference. Our aim was to bridge the gap between human rights scholarship and Natural Language Processing (NLP) research communities in tackling online abuse. We report on the discussions that took place, and present an analysis of four key issues which emerged: Problems in tackling online abuse, Solutions, Meta concerns and the Ecosystem of content moderation and research. We argue there is a pressing need for NLP research communities to engage with human rights perspectives, and identify four key ways in which NLP research into online abuse could immediately be enhanced to create better and more ethical solutions.

Detecting East Asian Prejudice on Social Media
Bertie Vidgen | Scott A. Hale | Ella Guest | Helen Margetts | David Broniatowski | Zeerak Waseem | Austin Botelho | Matthew Hall | Rebekah Tromble
Proceedings of the Fourth Workshop on Online Abuse and Harms

During COVID-19 concerns have heightened about the spread of aggressive and hateful language online, especially hostility directed against East Asia and East Asian people. We report on a new dataset and the creation of a machine learning classifier that categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice, and a neutral class. The classifier achieves a macro-F1 score of 0.83. We then conduct an in-depth ground-up error analysis and show that the model struggles with edge cases and ambiguous content. We provide the 20,000 tweet training dataset (annotated by experienced analysts), which also contains several secondary categories and additional flags. We also provide the 40,000 original annotations (before adjudication), the full codebook, annotations for COVID-19 relevance and East Asian relevance and stance for 1,000 hashtags, and the final model.

2019

Proceedings of the Third Workshop on Abusive Language Online
Sarah T. Roberts | Joel Tetreault | Vinodkumar Prabhakaran | Zeerak Waseem
Proceedings of the Third Workshop on Abusive Language Online

Proceedings of the 2019 Workshop on Widening NLP
Amittai Axelrod | Diyi Yang | Rossana Cunha | Samira Shaikh | Zeerak Waseem
Proceedings of the 2019 Workshop on Widening NLP

2018

Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
Darja Fišer | Ruihong Huang | Vinodkumar Prabhakaran | Rob Voigt | Zeerak Waseem | Jacqueline Wernimont
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

2017

Understanding Abuse: A Typology of Abusive Language Detection Subtasks
Zeerak Waseem | Thomas Davidson | Dana Warmsley | Ingmar Weber
Proceedings of the First Workshop on Abusive Language Online

As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and discuss the implications of this for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.

Proceedings of the First Workshop on Abusive Language Online
Zeerak Waseem | Wendy Hui Kyong Chung | Dirk Hovy | Joel Tetreault
Proceedings of the First Workshop on Abusive Language Online

2016

Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
Zeerak Waseem | Dirk Hovy
Proceedings of the NAACL Student Research Workshop

Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter
Zeerak Waseem
Proceedings of the First Workshop on NLP and Computational Social Science

Co-authors

Giuseppe Attanasio 3

Su Lin Blodgett 3

Maraim Masoud 3

Paul Röttger 3

Arjun Subramonian 3

Gavin Abercrombie 2

Timothy Baldwin 2

Stella Biderman 2

Agostina Calabrese 2

Miruna Clinciu 2

Pieter Delobelle 2

Lucie-Aimée Kaffee 2

Helen Margetts 2

Lambert Mathias 2

Margaret Mitchell 2

Aurelie Neveol 2

Flor Miriam Plaza-del-Arco 2

Dragomir Radev 2

Shanya Sharma 2

Joel Tetreault 2

Tristan Thrush 2

Deepak Tunuguntla 2

Oskar Van Der Wal 2

Adina Williams 2

Christine de Kock 2

Osama Mohammed Afzal 1

Alham Fikri Aji 1

Hamdan Al-Ali 1

Zaid Alyafeai 1

Isabelle Augenstein 1

Amittai Axelrod 1

Ioana Baldini 1

Alexandra Birch 1

Austin Botelho 1

David Broniatowski 1

Arijit Ghosh Chowdhury 1

Yi-Ling Chung 1

Wendy Hui Kyong Chung 1

Ryan Cotterell 1

Rossana Cunha 1

Thomas Davidson 1

Francesco De Toni 1

Kaustubh Dhole 1

Amirbek Djanibekov 1

Mónica Domínguez 1

Gérard Dupont 1

Lynette D’Mello 1

Chris Chinenye Emezue 1

Anjalie Field 1

Jessica Zosa Forde 1

Paula Fortuna 1

Maya Indira Ganesh 1

James Garforth 1

Vagrant Gautam 1

Atticus Geiger 1

Iryna Gurevych 1

Scott A. Hale 1

Carolin Holtermann 1

Ruihong Huang 1

Yacine Jernite 1

Masahiro Kaneko 1

Atoosa Kasirzadeh 1

Divyansh Kaushik 1

Nurulaqilla Khamis 1

Dietrich Klakow 1

Mathias Lambert 1

Anne Lauscher 1

Shayne Longpre 1

Roberto L Lopez-Davila 1

Sasha Luccioni 1

Pranava Swaroop Madhyastha 1

Jonibek Mansurov 1

Angelina McMillan-Major 1

Sagnik Mukherjee 1

Nurdaulet Mukhituly 1

Preslav Nakov 1

Nikita Nangia 1

Kanika Narang 1

Shaoliang Nie 1

Pedro Ortiz Suarez 1

Anaelia Ovalle 1

Nirali Parekh 1

Janet Pierrehumbert 1

Giada Pistilli 1

Esther Ploeger 1

Christopher Potts 1

Grusha Prasad 1

Sebastian Riedel 1

Verena Rieser 1

Pratik Ringshia 1

Sarah T. Roberts 1

Beatrice Savoldi 1

Michael Schlichtkrull 1

Haitham Seelawi 1

Samira Shaikh 1

Shady Shehata 1

Amanpreet Singh 1

Karolina Stanczak 1

Pontus Stenetorp 1

Eliza Szczechla 1

Tair Djanibekov 1

Tiago Timponi Torrent 1

Hawau Olamide Toyin 1

Rebekah Tromble 1

Yulia Tsvetkov 1

Eddie L. Ungless 1

Josef Valvoda 1

Francielle Vargas 1

Emilio Villa-Cueva 1

Marcelo Viridiano 1

Nikolas Vitsakis 1

Karan Waghela 1

Dana Warmsley 1

Jacqueline Wernimont 1

Daniel van Strien 1

Venues