Ethics Sheets for AI Tasks

Several high-profile events, such as the mass testing of emotion recognition systems on vulnerable sub-populations and using question answering systems to make moral judgments, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. At issue here are not just individual systems and datasets, but also the AI tasks themselves. In this position paper, I make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. I will also present a template for ethics sheets with 50 ethical considerations, using the task of emotion recognition as a running example. Ethics sheets are a mechanism to engage with and document ethical considerations before building datasets and systems. Similar to survey articles, a small number of carefully created ethics sheets can serve numerous researchers and developers.


The Case
Good design helps everyone. 1 As Artificial Intelligence (AI), Machine Learning (ML), and Natural language Processing (NLP) systems become more ubiquitous, their broad societal impacts are receiving more scrutiny than ever before. However, several high-profile instances such as the use of recidivism prediction biased against people from black neighborhoods 2 , face-recognition systems that perform poorly for people with dark skin tones (Buolamwini and Gebru, 2018), machine translation systems that are biased against some genders (Prates et al., 2019), and mass testing of emotion recognition systems on certain subpopulations 3 , have highlighted how technology is often at odds with the very people it is meant to help, and how it will often lead to more adverse outcomes for those that are already marginalized. This raises uncomfortable questions for us AI researchers and developers: What role do we play in this? What are the hidden assumptions in our research? What are the unsaid implications of our choices? Are we perpetuating and amplifying inequities or are we striking at the barriers to opportunity?
The answers are often complex and multifaceted. While many AI systems have clear benefits, we are increasingly seeing real-world AI systems causing harm and growing criticisms of published research that often feeds into real-world systems: criticisms of physiognomy, racism, bias, discrimination, perpetuating stereotypes, decimating native cultures, and more. See Arcas, Mitchell, and Todorov (2017) 4 and Ongweso Jr (2020) 5 for recent examples. There have also been legitimate criticisms of the thoughtlessness in machine learning (e.g., is automating this task, this way, really going to help people?) and a seemingly callous disregard for the variability and complexity of human behavior (McQuillan, 2018;Fletcher-Watson et al., 2018;Birhane, 2021).
In the sub-sections below, I describe some recent efforts by the AI community to encourage responsible research, the limitations of those efforts, and the need for thinking about ethical con-siderations at the level of AI tasks. The next section presents a new proposal to that end: to create ethics sheets for AI tasks. This is followed by an example ethics sheet for automatic emotion recognition and a summarizing discussion.

Recent Innovations to Encourage Responsible Research
So how are we addressing these new challenges in AI/ML/NLP research? For individual datasets, it is recommended to create datasheets or data statements Bender and Friedman, 2018) (lists key details of the dataset such as composition and intended uses; meant to encourage appropriate use of the data). For individual systems, it is recommended to create model cards (Mitchell et al., 2019) der et al., 2021) However, these documents are not standardized in any way. They tend to focus on certain most poignant ethical considerations, as opposed to capturing the wide array of relevant ethical considerations. They are often presented as position papers; and not as reference documents that are easy to explore and jump to the issues of interest. Additionally, ethical considerations also apply at the level of AI Tasks.

Ethics at the Level of AI Tasks
I am defining AI task to simply mean some task we may want to automate using AI techniques.
An AI system is a particular AI model built to do the task. Individual systems have their own unique sets of ethical considerations (depending on the choices that were made in terms of how to create the system). However, some ethical considerations apply not at the level of individual systems, but at the level of the task. For example, consider the task of detecting personality traits from one's history of utterances. Even before we consider individual systems that execute this task, we ought to consider questions such as: • What are the societal implications of automating personality trait detection?
• How can such a system be used/misused?
• What are the privacy implications?
• Is there enough credible scientific basis for personality trait identification that we should attempt to do this?
• Which theory of personality traits should such automation rely on? What are the implications of that choice? and so on.
Currently, AI conferences and journals do not have a dedicated place where one can discuss such questions that apply to the tasks being automated. Apart from ethical considerations that apply directly to the task, we know that there are ethical considerations latent in the choices we make in dataset creation, model development, and evaluation. Poor choices have manifested in controversies for a number of AI tasks, including: • Text Generation: 'Dangerous' AI offers to write fake news, BBC. 7 • Image Generation: 'Deepfakes' a political problem already hitting EU, EU Observer. 8 • Automatic emotion / sentiment recognition from text: Examining Race and Gender bias in Sentiment Analysis Systems, (Kiritchenko and Mohammad, 2018); How to make a racist AI without really trying, Robyn Speer. 9 • Automatic emotion recognition from faces: Emotional Entanglement: China's emotion recognition market and its implications for human rights, Article19 10 .
• Once we read the relevant literature (especially that which engages various stakeholders) and develop some AI systems, it is not hard to begin to identify some of the ethical considerations for various NLP, ML, and AI tasks; but that takes time. Meanwhile, we have tens of thousands of new researchers joining our ranks. Even for those of us that have been here a while-we can benefit from a careful compilation of ethical considerations.

Proposal: Ethics Sheets for AI Tasks
If one wants to do work on an AI Task, then right at the beginning it is useful to have: a carefully compiled document that substantively engages with the ethical issues relevant to that task; going beyond individual systems and datasets, drawing on knowledge from a body of relevant past work and engagement with various stakeholders.
Similarly, if one conceptualizes a new AI Task, then right at the beginning, it will be useful to develop such a source of information. Therefore, I propose that we create such documents, which I will refer to as Ethics Sheets for AI Tasks. Simply put: an ethics sheet for an AI task is a semi-standardized document that aggregates and organizes a wide variety of ethical considerations relevant for that task. It: • Fleshes out assumptions hidden in how the task is framed, and in the choices often made regarding the data, method, and evaluation. • Presents ethical considerations unique or especially relevant to the task. • Presents how common ethical considerations manifest in the task. • Presents relevant dimensions and choice points; along with tradeoffs for various stakeholders. • Lists common harm mitigation strategies.
• Communicates societal implications of AI systems to researchers, developers, and the broader society in an accessible way. Ethics sheets may sometimes suggest that certain applications in certain contexts are good or bad ideas, but largely they are meant to discuss what are the various considerations to be taken into account when deciding how to build or use a particular system, whether to build or use a particular system, what is more appropriate for a given context, etc. A sheet should flesh out various such considerations that apply at the task level. It should also flesh out ethical consideration of common theories, methodologies, resources, and practices used in building AI systems for the task. A good ethics sheet should make us question some of the assumptions that often go unsaid.
One key motivation for developing ethics sheets is to encourage more thoughtfulness in our work: • Why should we automate this task?
• What is the degree to which human behavior relevant to this task is inherently ambiguous and unpredictable? • What are the theoretical foundations at the heart of this task? • What are the social and cultural forces at play that motivate choices in task design, data, methodology, and evaluation? (Science is not immune to these forces-there is no 'view from nowhere').
• How is the automation of the task going to impact various groups of people?
• How can the automated systems be abused?
• Is this technology helping everyone or only those with power and advantage? etc.
Thinking about these questions is important if we want to break away from the current paradigm of building things that are divisive (by working well for some and poorly for others) and instead move to building systems that treat human diversity and variability as a feature (not a hurdle), dismantle barriers to opportunity, and bring diverse groups of people together. Thus, questions such as these can be very useful in determining what is included in ethics sheets. • People whose data is used • People on whose data AI systems are applied • Society at large Owing to differences in backgrounds and needs, it is better to create versions of the Ethics Sheet tailored to stakeholders, for example: • one sheet for society at large (without jargon and with a focus on how system behaviour can impact them and how they can contribute/push-back); • one sheet for researchers, developers, and the motivated non-technical reader (with perhaps a greater emphasis on system building choices and their implications). Notes: The set of ethical considerations for a task is not a static list; it needs to be continuously or periodically revisited and updated. There is no one person or institution that can claim to be the authority or provide the authoritative ethics sheets for the task. They can be developed iteratively and organically through input from multiple individuals and teams of researchers, practitioners, and scholarly organizations such as workshops and conferences. The ethics sheet is not a silver bullet, but rather just another tool in our armament for responsible research. The goal here is to raise awareness of the ethical considerations so that we think of new and better approaches for responsible research. The goal here is not to provide a list of easy solutions that "solve ethics".

Components of an Ethics Sheet
Below are some sections that I think are central. However, every task is different, and may warrant additional sections. Preface: Present why and how the sheet came to be written. The process followed. Who worked on it along with their professional or lived experience relevant to the subject matter. Challenges faced in writing the sheet. Changes made, if a revision of an earlier sheet. Version number, date published, and contact information. Introduce, Define, Set Scope: Introduce the task and some common manifestations of the task. Define relevant terminology. Set the scope of the ethics sheet (e.g., maybe you are creating a sheet for speech input, but not textual input). Motivations and Benefits: Provide an overview of common benefits and motivations of the task. Ethical Considerations: This is the star of the show. Aggregate and organize the ethical considerations associated with the AI task. Present the trade-offs associated with choices. Present harm mitigation strategies. Cite relevant literature. Organization of ethical considerations should be based on the primary target audience. For example, ethics sheets primarily for researchers and developers may benefit from sub-sections on: Task Design, Data, Method, and Evaluation. Task design may benefit from sub-sections for theoretical foundations and 'why automate this task'. Evalua-tion will benefit from sub-sections that go beyond quantitative metrics. Other: Include anything that helps with the goals of the Ethics Sheet.

Benefits of Ethics Sheets
Ethics sheets for AI Tasks address a number of concerns raised in the first section of this paper. Specifically, their main benefits can be summarized as shown below: 1. Encourages more thoughtfulness regarding why to automate, how to automate, and how to judge success in AI research and development. 2. Fleshes out assumptions in how the task is commonly framed, and in the choices often made regarding data, method, and evaluation. 3. Presents the trade-offs of relevant choices so that stakeholders can make informed decisions appropriate for their context. Ethical considerations often involve a cost-benefit analysis; where we draw the lines may differ depending on our cultural and societal norms. 4. Identifies points of agreement and disagreement. Includes multiple points of view. 5. Moves us towards consensus and standards. 6. Helps us better navigate research and implementation choices. 7. Helps in developing better datasheets and model cards. 8. Has citations and pointers; acts as a jumping off point for further reading. 9. Helps stakeholders challenge assumptions made by researchers and developers. 10. Helps all stakeholders develop harm mitigation strategies. 11. Standardized sections and a familiar look and feel make it easy for the compilation and communication of ethical considerations. 12. Can play a vital role in engaging the various stakeholders of an AI task with each other. 13. Multiple ethics sheets can be created for the same task to reflect multiple perspectives, viewpoints, and what is considered important to different groups of people at different times. 14. Acts as a great introductory document for an AI Task (complements survey articles and task-description papers for shared tasks).

Discussion and FAQ
The idea of ethics sheets raises several important questions that are worthy of discussion.
Q1. Should we create ethics sheets for a handful of AI Tasks (more prone to being misused, say) or do we need ethics sheets for all AI tasks?
A. To me, the answer is clear. We need to write ethics sheets for every task. This follows from the idea that we need to think about ethics considerations pro-actively and not as a reaction to harms that we observe after system deployment. Different AI tasks may be more or less prone to controversy, but all AI tasks impact people in some way, and thus have ethical considerations. Sometimes even small and seemingly innocuous choices can have far-reaching implications. Sometimes a thoughtful consideration can help make a small, but notable difference, to improve someone's life. Ethics sheets for AI Tasks can provide the means for us as a collective to provide, in writing, what we think are the ethical considerations and the societal implications of AI Tasks. For some tasks, this document can be short and straightforward indicating minimum risk; and that document and the process that led to it is still useful. We do not know if there is minimum risk, without some amount of investigation. Having a written document allows others to challenge assumptions. Periodically revising the document builds on our knowledge. On the other hand, we cannot predict everything and anticipate every harm. So ethics sheets will always be incomplete and require revisions. We need not let that stop us from creating a document that will be useful to others.

Q2. Who should create ethics sheets?
A. There are two things going on here: 1. Who should take a lead in developing ethics sheets (who should take on more of the burden)? 2. Whose voices should be included when developing ethics sheets? For 1, anyone or any group can take the lead. Researchers who are working on the task (or proposing a new task) are well-positioned to do the ethics sheet as they are familiar with the intricacies of the task and likely thinking about the ethical implications already. However, experienced researchers may have more blind spots. New researchers, especially those from Social Science, Psychology, Linguistics, etc. can bring vital new insights. For 2, voices of all stakeholders should be included (especially of those impacted by the technology).
Ethics sheets can be developed iteratively through input from multiple individuals and teams of stakeholders. They can be developed through community efforts in workshops and conferences. One can also imagine a meta-sheet that summarizes or compiles information from multiple ethics sheets for a task. Not everyone needs to create an ethics sheet, but it is important to include voices from a diverse set of people (research backgrounds, locations of work, etc.) in an ethics sheet.
Q3. Should ethics sheets be built *only* through organized community efforts and by a joint consortium of all stakeholders? Should we only have authoritative ethics sheets and not a plethora of ethics sheets for the same task?
A. IMHO, no and no. While building ethics sheets through organized community efforts is fantastic, we should not limit that to be the only avenue. There are several reasons for this: Such community efforts take tremendous resources, organization, and fortitude. They can benefit considerably from early and focused ethics sheets developed at a smaller scale. Community efforts also face significant challenges in terms of how to incorporate everyone's opinions. Community efforts have the tendency to only include agreed upon non-controversial ideas that do not threaten existing power structures.
Also, in some ways, ethics sheets are akin to survey papers. Their scope is not individual pieces of work, but a body of literature. One can argue that survey articles should be community efforts or that they be created by all stakeholders. However, we also value the expertise of individual or small groups of researchers to create survey articles. We agree that it is their perspective and does not speak for the whole community. A similar affordance could be given to creators of ethics sheets.
So, IMHO, it is better to have a multitude of ethics sheets reflecting the diversity in viewpoints and what is valued by different groups of people. We should be wary of the world where we have single authoritative ethics sheets per task and no dissenting voices. I would even encourage people to build their own personal ethics sheets (building on existing ethics sheets where available) even if they cannot extensively engage all stakeholders. After all, thinking about ethical considerations should be a natural part of one's work.
Q4. How can we incentivize researchers to create Ethics Sheets? Could this be a publication? Should conferences have specific tracks for these?
A. Good ethics sheets are useful to researchers. So I expect they will be widely appreciated, especially by those new to an AI Task. They are also useful to those who create the sheet. I created an ethics sheet for emotion recognition because I do research on emotions and language, and I wanted to organize my thoughts around relevant ethical considerations. Our conferences are starting to accept more papers that make contributions outside of computational research (even if much is still desired). So my hope is that good ethics sheets will be accepted even without a special track. That said, clear signals from conferences and journals that such contributions are valued (perhaps by creating dedicated tracks) is important.
Traditionally, work on identifying and discussing ethical considerations has often been under-valued compared to improving on accuracy metrics and computational methods for mitigating bias/issues. Therefore, just as many conferences now have a Resource and Evaluation track or a Survey Paper track, I propose we create dedicated conference tracks (with appropriate reviewing forms) for identifying and discussing ethical considerations and societal impacts of AI. The papers in this track may also provide avenues for responsible AI research and system deployment using ideas from various other fields and participatory research. Notably though, this will be a home for non-computational ethics work. Ethics sheets can be one of many paper types submitted there.
Shared-task proposals can be encouraged to develop or point to relevant ethics sheets. Also, one can cite ethics sheets for accepted norms in a field and for information on relevant ethical considerations. So creators of ethics sheets can get credit.
Q5. When should we be creating Ethics Sheets for AI Tasks? Normally, we learn about ethical issues because/after they have been deployed.
A. While we cannot foresee all consequences of our creations, it would be fair to say AI researchers have not done enough to anticipate the negative consequences of systems that we have created and deployed. Additionally, with great work over the last few years highlighting the ethical implications of AI systems, we are better placed to anticipate issues for the future. Therefore: For existing tasks: we should create ethics sheets now; revisit them periodically and update them as necessary.
For new proposed tasks: the authors should create ethics sheets along with the paper introducing the task; as the task has more buy-in from the research community, others can also create ethics sheets for it; we revisit the sheets periodically and update them as necessary.
Q6. Who defines a "task" for ethics sheets? AI tasks can be defined at a high/general level (e.g., automatic emotion recognition) or fine/specific level (e.g., detecting sentiment in book reviews).
A. We can let community interest and expertise guide what task definitions are used (similar to topics of survey papers). It is great to have multiple overlapping ethics sheets that cover AI tasks at overlapping levels of specificity. There is no "objective" or "correct" ethics sheet or survey article. There is no one "correct" scope or task definition for ethics sheets. It is useful to have multiple ethics sheets for the same or overlapping tasks, just as it is useful to have multiple survey articles for the same area of research-they provide different perspectives.
Q7. Should the sheets depend on the kind of data or modality involved?
A. Yes, one can create focused ethics sheets as appropriate. In the example AER sheet, I specify in the "Scope and Modalities" section that the sheet focuses primarily on AER from language (text); however, many of the considerations apply to other modalities as well and the sheet also addresses ethical considerations that apply to AER in general (regardless of data/modality).
Q8. Should we think about research systems differently from deployed systems that directly impact people?
A. I think that is a fair point. Deployed systems have a much higher bar in terms of balancing many ethical considerations. It is common for research systems focus on certain restricted dimensions (say accuracy on certain test sets) ignoring certain other dimensions. However, research systems are often picked up by developers and deployed. So research systems should make their dimensions of focus clear to the reader/user. They should also discuss the suitability of deploying such a system, intended uses, and ethical issues that may arise if one deploys their system.
Q9. Why should academic researchers care about ethics of system deployment? Isn't this the responsibility of those who deploy systems? A. Academic research feeds commercial research and development. We need to communicate the ethical considerations of what we create. Also, we are often not in positions of conflict of interest. We do not have to worry about losing our jobs for raising concerns.
Q10. Will these sheets be valid for only a certain time period? Is there a time dimension for these ethics sheets? A. Yes, the sheets are only "valid" as long as people think they are useful. If the sheets no longer reflect the values we hold, or if things change, we need to create revisions and new ethics sheets.

Example: Ethics Sheet for Automatic Emotion Recognition
Emotions play a central role in our lives. Automatic Emotion Recognition (AER)-or "giving emotional abilities to computers" as Dr. Rosalind Picard described it in her seminal book Affective Computing)-is a sweeping interdisciplinary area of study exploring many foundational research questions and many commercial applications (Picard, 2000). However, some of the recent commercial and governmental uses of emotion recognition have garnered considerable criticism. 13 Even putting aside high-profile controversies, emotion recognition impacts people and thus entails ethical considerations (big and small).
Since I work at the intersection of language and emotions, I created an example ethics sheet for automatic emotion recognition: (a) for my own benefit-to organize my thoughts on responsible AER, and (b) to present a proof-of-concept for the idea of ethics sheets for AI Tasks. The full sheet is available online. 14 . I summarize key details below. Note that many of the ethical considerations listed here apply broadly to natural language tasks in general. Thus, it can serve as a useful template to build ethics sheets for other tasks.

Preface
The preface is an opportunity to frame the discussion. In the AER sheet, I present a short set of rapid-fire questions centered around questions such as whether it is ethical to do automatic emotion recognition, how automatic recognition can mean many things, and it can be deployed in many contexts, how emotions are particularly personal, private, and complex; and how the ethics sheet can help in more responsible AER research as well as system development and deployment. The sheet invites feedback and provides contact information. It also lists the primary motivation for this AER ethics sheet and the target audience.
Primary Motivation: To create a go-to point for a carefully compiled critical engagement with the ethical issues relevant to emotion recognition; going beyond individual systems and datasets and drawing on knowledge from a body of past work.
Target audience: The primary audience for this sheet are researchers, engineers, developers, and educators from NLP, ML AI, data science, public health, psychology, digital humanities, and other fields that build, make use of, or teach about AER technologies and emotion resources; however, much of the discussion should be accessible to all stakeholders of AER, including educators, policy/decision makers, those who are impacted by AER. After more community input, I hope we can also create a version of this sheet where nontechnical stakeholders are the primary audience.

Modalities and Scope
Modalities: Work on AER has made use of a number of modalities (sources of input), including: facial expressions, gait (how one is walking, body language), body velocity, skin conductance, blood conductance, blood flow, respiration, gestures, force of touch, infrared emanations, haptic (sensors of force) and proprioceptive (position and movement of the body) data, behavioral data, speech, language (esp. written text, emoticons, emojis). All of these modalities come with benefits, potential harms, and ethical considerations. Scope: This particular ethics sheet focuses on AER from written text and AER in Natural Language Processing (NLP), but many of the considerations apply broadly to various modalities and AER in Computer Vision as well.

Task
Emotion recognition is a broad umbrella term used to refer to a number of related tasks such as inferring emotions the speaker is trying to convey, inferring patterns of speaker's emotions over longer periods of time, tracking impact of health interventions on one's well-being, inferring speaker's attitudes/sentiment towards a target product, movie, person, idea, policy, entity, etc. Each of these framings has ethical considerations and may be more or less appropriate for a given context. For example, framing the task as determining the mental state is especially problematic due to concerns about privacy and reliability.

Applications
The sheet presents a sample of existing applications of AER in public health, commerce, government policy, art and literature, research (social Sciences, neuroscience, psychology), and intelligence. Note that listing of applications in the ethics sheet is not an endorsement. Note also that all of the benefits come with potential harms and ethical considerations. Use of AER for intelligence and education is especially controversial and laced with ethical considerations.

Ethical Considerations
The usual approach to building an AER system is to design the task (identify the emotions to capture, the process to be automated, etc.), compile appropriate data (label some of the data for emotions), run ML model (methods) to capture patterns of languagevision and emotional expression from the data, and evaluate the models by examining their predictions on a held-out test set. There are ethical considerations associated with each step of this development process. Below are 50 considerations grouped by the associated development stage: Task Design, Data, Method, Impact, Privacy & Social Groups. I present only a high-level summary for each category below. (See the sheet for descriptions of each bullet).

TASK DESIGN
Summary: This section discusses various ethical considerations associated with the choices involved in the framing of the emotion task and the implications of automating the chosen task. Some important considerations include: Whether it is even possible to determine one's internal mental state? And, whether it is ethical to determine PRIVACY AND SOCIAL GROUPS Summary: The privacy section discusses both individual and group privacy. The idea of group privacy becomes especially important in the context of soft-biometrics determined through AER that are not intended to be able to identify individuals, but rather identify groups of people with similar characteristics. The subsection on social groups discusses the need for work that does not treat people as a homogeneous group (ignoring group differences and implicitly favoring the majority group) but rather values disaggregation and explores intersectionality, while minimizing reification and essentialization of social constructs.

Concluding Thoughts
In this paper, I discussed how ethical considerations apply not just at the level of individual models and datasets, but also at the level of AI Tasks. I presented a new form of documenting ethical considerations at the level of AI Tasks, which I call Ethics Sheets for AI Tasks. It is a document dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. I listed various benefits of such ethics sheets and discussed practical considerations with regard to who creates ethics sheets, how, and when. I also provided an example, proof-of-concept, ethics sheet for automatic emotion recognition. Ethics sheets have the potential for engaging various stakeholders of AI tasks towards responsible development. 15 https://medium.com/@nlpscholar/ethics-aer-tips-5cebadf1273c