Mark Díaz

Also published as: Mark Diaz

2024

This research introduces STAR, a sociotechnical framework that improves on current best practices for red teaming safety of large language models. STAR makes two key contributions: it enhances steerability by generating parameterised instructions for human red teamers, leading to improved coverage of the risk surface. Parameterised instructions also provide more detailed insights into model failures at no increased cost. Second, STAR improves signal quality by matching demographics to assess harms for specific groups, resulting in more sensitive annotations. STAR further employs a novel step of arbitration to leverage diverse viewpoints and improve label reliability, treating disagreement not as noise but as a valuable contribution to signal quality.

pdf bib abs

GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran | Christopher Homan | Lora Aroyo | Aida Mostafazadeh Davani | Alicia Parrish | Alex Taylor | Mark Diaz | Ding Wang | Gregory Serapio-García
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Human annotation plays a core role in machine learning — annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement learning, to cite a few avenues. However, the fact that many of these human annotations are inherently subjective is often overlooked. Recent work has demonstrated that ignoring rater subjectivity (typically resulting in rater disagreement) is problematic within specific tasks and for specific subgroups. Generalizable methods to harness rater disagreement and thus understand the socio-cultural leanings of subjective tasks remain elusive. In this paper, we propose GRASP, a comprehensive disagreement analysis framework to measure group association in perspectives among different rater subgroups, and demonstrate its utility in assessing the extent of systematic disagreements in two datasets: (1) safety annotations of human-chatbot conversations, and (2) offensiveness annotations of social media posts, both annotated by diverse rater pools across different socio-demographic axes. Our framework (based on disagreement metrics) reveals specific rater groups that have significantly different perspectives than others on certain tasks, and helps identify demographic axes that are crucial to consider in specific task contexts.

pdf bib abs

D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation
Aida Davani | Mark Díaz | Dylan Baker | Vinodkumar Prabhakaran
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

While human annotations play a crucial role in language technologies, annotator subjectivity has long been overlooked in data collection. Recent studies that critically examine this issue are often focused on Western contexts, and solely document differences across age, gender, or racial groups. Consequently, NLP research on subjectivity have failed to consider that individuals within demographic groups may hold diverse values, which influence their perceptions beyond group norms. To effectively incorporate these considerations into NLP pipelines, we need datasets with extensive parallel annotations from a variety of social and cultural groups.In this paper we introduce the D3CODE dataset: a large-scale cross-cultural dataset of parallel annotations for offensive language in over 4.5K English sentences annotated by a pool of more than 4k annotators, balanced across gender and age, from across 21 countries, representing eight geo-cultural regions. The dataset captures annotators’ moral values along six moral foundations: care, equality, proportionality, authority, loyalty, and purity. Our analyses reveal substantial regional variations in annotators’ perceptions that are shaped by individual moral values, providing crucial insights for developing pluralistic, culturally sensitive NLP models.

pdf bib abs

Intersectionality in AI Safety: Using Multilevel Models to Understand Diverse Perceptions of Safety in Conversational AI
Christopher Homan | Gregory Serapio-Garcia | Lora Aroyo | Mark Diaz | Alicia Parrish | Vinodkumar Prabhakaran | Alex Taylor | Ding Wang
Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024

State-of-the-art conversational AI exhibits a level of sophistication that promises to have profound impacts on many aspects of daily life, including how people seek information, create content, and find emotional support. It has also shown a propensity for bias, offensive language, and false information. Consequently, understanding and moderating safety risks posed by interacting with AI chatbots is a critical technical and social challenge. Safety annotation is an intrinsically subjective task, where many factors—often intersecting—determine why people may express different opinions on whether a conversation is safe. We apply Bayesian multilevel models to surface factors that best predict rater behavior to a dataset of 101,286 annotations of conversations between humans and an AI chatbot, stratified by rater gender, age, race/ethnicity, and education level. We show that intersectional effects involving these factors play significant roles in validating safety in conversational AI data. For example, race/ethnicity and gender show strong intersectional effects, particularly among South Asian and East Asian women. We also find that conversational degree of harm impacts raters of all race/ethnicity groups, but that Indigenous and South Asian raters are particularly sensitive. Finally, we discover that the effect of education is uniquely intersectional for Indigenous raters. Our results underscore the utility of multilevel frameworks for uncovering underrepresented social perspectives.

pdf bib abs

How people interpret content is deeply influenced by their socio-cultural backgrounds and lived experiences. This is especially crucial when evaluating AI systems for safety, where accounting for such diversity in interpretations and potential impacts on human users will make them both more successful and inclusive. While recent work has demonstrated the importance of diversity in human ratings that underlie AI pipelines, effective and efficient ways to incorporate diverse perspectives in human data annotation pipelines is still largely elusive. In this paper, we discuss the primary challenges faced in incorporating diversity into model evaluations, and propose a practical diversity-aware annotation approach. Using an existing dataset with highly parallel safety annotations, we take as a test case a policy that prioritizes recall of safety issues, and demonstrate that our diversity-aware approach can efficiently obtain a higher recall of safety issues flagged by minoritized rater groups without hurting overall precision.

2023

pdf bib abs

Relationality and Offensive Speech: A Research Agenda
Razvan Amironesei | Mark Diaz
The 7th Workshop on Online Abuse and Harms (WOAH)

We draw from the framework of relationality as a pathway for modeling social relations to address gaps in text classification, generally, and offensive language classification, specifically. We use minoritized language, such as queer speech, to motivate a need for understanding and modeling social relations–both among individuals and among their social communities. We then point to socio-ethical style as a research area for inferring and measuring social relations as well as propose additional questions to structure future research on operationalizing social context.

2022

pdf bib abs

Accounting for Offensive Speech as a Practice of Resistance
Mark Diaz | Razvan Amironesei | Laura Weidinger | Iason Gabriel
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Tasks such as toxicity detection, hate speech detection, and online harassment detection have been developed for identifying interactions involving offensive speech. In this work we articulate the need for a relational understanding of offensiveness to help distinguish denotative offensive speech from offensive speech serving as a mechanism through which marginalized communities resist oppressive social norms. Using examples from the queer community, we argue that evaluations of offensive speech must focus on the impacts of language use. We call this the cynic perspective– or a characteristic of language with roots in Cynic philosophy that pertains to employing offensive speech as a practice of resistance. We also explore the degree to which NLP systems may encounter limits to modeling relational context.

pdf bib abs

Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations
Aida Mostafazadeh Davani | Mark Díaz | Vinodkumar Prabhakaran
Transactions of the Association for Computational Linguistics, Volume 10

Majority voting and averaging are common approaches used to resolve annotator disagreements and derive single ground truth labels from multiple annotations. However, annotators may systematically disagree with one another, often reflecting their individual biases and values, especially in the case of subjective tasks such as detecting affect, aggression, and hate speech. Annotator disagreements may capture important nuances in such tasks that are often ignored while aggregating annotations to a single ground truth. In order to address this, we investigate the efficacy of multi-annotator models. In particular, our multi-task based approach treats predicting each annotators’ judgements as separate subtasks, while sharing a common learned representation of the task. We show that this approach yields same or better performance than aggregating labels in the data prior to training across seven different binary classification tasks. Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods. Being able to model uncertainty is especially useful in deployment scenarios where knowing when not to make a prediction is important.

2021

pdf bib abs

On Releasing Annotator-Level Labels and Information in Datasets
Vinodkumar Prabhakaran | Aida Mostafazadeh Davani | Mark Diaz
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

A common practice in building NLP datasets, especially using crowd-sourced annotations, involves obtaining multiple annotator judgements on the same data instances, which are then flattened to produce a single “ground truth” label or score, through majority voting, averaging, or adjudication. While these approaches may be appropriate in certain annotation tasks, such aggregations overlook the socially constructed nature of human perceptions that annotations for relatively more subjective tasks are meant to capture. In particular, systematic disagreements between annotators owing to their socio-cultural backgrounds and/or lived experiences are often obfuscated through such aggregations. In this paper, we empirically demonstrate that label aggregation may introduce representational biases of individual and group perspectives. Based on this finding, we propose a set of recommendations for increased utility and transparency of datasets for downstream use cases.

Mark Díaz

2024

2023

2022

2021

Co-authors

Venues