Guiding Principles for Participatory Design-inspired Natural Language Processing

We introduce 9 guiding principles to integrate Participatory Design (PD) methods in the development of Natural Language Processing (NLP) systems. The adoption of PD methods by NLP will help to alleviate issues concerning the development of more democratic, fairer, less-biased technologies to process natural language data. This short paper is the outcome of an ongoing dialogue between designers and NLP experts and adopts a non-standard format following previous work by Traum (2000); Bender (2013); Abzianidze and Bos (2019). Every section is a guiding principle. While principles 1–3 illustrate assumptions and methods that inform community-based PD practices, we used two fictional design scenarios (Encinas and Blythe, 2018), which build on top of situations familiar to the authors, to elicit the identification of the other 6. Principles 4–6 describes the impact of PD methods on the design of NLP systems, targeting two critical aspects: data collection & annotation, and the deployment & evaluation. Finally, principles 7–9 guide a new reflexivity of the NLP research with respect to its context, actors and participants, and aims. We hope this guide will offer inspiration and a road-map to develop a new generation of PD-inspired NLP.

1 The principles are guided by the authors experience, primarily focused in Europe (with the exception of one of them). However, we would defend the applicability of most of them to a wider range of contexts, with the situated effort of appropriation and transformation that is an integral part of PD.
As a form of system design performed with and by people (Briefs et al., 1983), PD entails a process of mutual learning among participants, among design researchers, and between design researchers and participants (Simonsen and Robertson, 2012). Traditionally, that means adopting a variety of research and design methods, from workshops (Ehn et al., 1996) to participant observations (Blomberg and Karasti, 2012a), passing through cards  or games (Vaajakallio and Mattelmäki, 2014), to include scenarios (Bødker, 2000), prototypes (Kannabiran and Bødker, 2020), and many others. The appropriate combination of methods and activities is determined, in a situated way, beginning with the involvement of different social groups (Bratteteig et al., 2012).
Historically, PD questions who is involved in the design process from various communities (DiSalvo et al., 2012) to specific socio-economic actors (Teli, 2015) and how. As a consequence, the design process can and should reflect on the visions for social transformation that the participants can develop (Huybrechts et al., 2020;Helgason et al., 2020), by translating those visions into alternatives to existing technologies (Korsgaard et al., 2016).

Design is an inherently disordered and unfinished process
Being based on nurturing relations between professional technology designers and members of the various social groups they interact with, PD methods and practices acknowledge that designing digital technologies with non-professionals does not follow a linear model (Callon, 2004;Cibin et al., 2020). Even when formalized (Bratteteig et al., 2012), the design process is disordered and unfinished. This character is well represented by the expressions use-before-use and design-after-design (Ehn, 2008;Fry, 2017).

PD principles
1. PD is about consensus and conflict • PD entails a process of mutual learning between researchers and community • PD adopts a variety of research and design methods (workshops, participants observation, cards, ...)

Design is an inherently disordered and unfinished process
• Use-before-use: tool's use is envisioned before the tool is actually implemented • Design-after-design: tool's design isn't exhausted with delivery, but will be modified by the users' appropriation, use, and feedback 3. Communities are often not completely determined a priori  Use-before-use addresses the common practice to build an image of the use of a product by people before use actually take place. The methods employed to favor people determination of use-beforeuse (e.g., workshops, design games, fictional scenarios, and prototyping) can become part of forms of participation washing (Sloan et al., 2020), that is the use of methods belonging to PD in processes in which participants do not have a significant influence on the outcome. When done properly, the keys in PD process are the articulation of transformative visions (Huybrechts et al., 2020), the ethnographic approach to design (Blomberg and Karasti, 2012b), and the reflexive discussion on the position of designers, communities, and institutions (Lyle et al., 2018;Teli et al., 2020).
Design-after-design addresses the possibility of people's manipulation of "finished" products. Design-after-design needs to be investigated and favored through concepts like infrastructuring (Karasti, 2014) or by looking at the connections between specific digital artifacts and wider artifacts ecologies (Bødker and Klokmose, 2012).

Communities are often not determined a priori
The last 20 years have seen a change in the subjects involved in PD, with the notion of community becoming one of the most relevant to describe the participants to PD projects (Dittrich et al., 2002;DiSalvo et al., 2012;Light and Miskelly, 2019). The notion of community is complex and multifaceted. Long lasting criteria such as the sharing a place, an interest, or a condition have proven to be limited (Mosconi et al., 2017;Thinyane et al., 2018;Cibin et al., 2019;Teli et al., 2020). This paper defines a community as the presence of dense social relations and of, at least, an element -being it geography, interests, specific conditions, or structural position in society in terms of power -tying together its members. Each of these dimensions represents a challenge to current practices of design and realization of NLP systems.
Although the definition of community recalls an idea of a unitary whole, the ensemble of the participants to a project is not always completely determined a priori but it could get formed within and through the design process (Le Dantec and DiSalvo, 2013), which current sampling methods in NLP mostly fail to capture.
A consolidated tendency is to look at PD practices in terms of empowerment of marginalized groups (Ertner et al., 2010;Racadio et al., 2014). Their adoption and integration in the NLP pipeline can help to address underexposure of both language varieties and linguistic phenomena.
Mario is a scholar in Human-Computer Interaction and technology design. He works on a project to support the development of community radio stations by rural and isolated communities. One of the communities involved belongs to a village of about 600 inhabitants located between a river delta and the Black Sea in Romania. The inhabitants are mainly descendants of a group of Ukrainian Cossacks who immigrated there in the 18th century. In addition to speaking Romanian the residents speak a Ukrainian dialect. Together with a Romanian NGO specialising in human rights and media democracy, Mario works to involve the inhabitants as volunteers to run the radio station and create content for the programs. However, the Romanian broadcasting license obliges stations to transmit 24 hours a day, and the volunteers struggle to create enough content. Mario proposes to use a new and advanced natural language generation system, GPT-3, to generate content. Besides the fact that the machine does not "speak" the community's dialect and requires English translations, GPT-3 produces output with prejudices and negative stereotypes against the community.

Data and communities are not separate things
As we saw in the first three points, communities represent the core element of PD. One might expect that communities have a prominent role in the development of NLP systems. Indeed, communities are the producers of the oil that runs NLP research: language data. We observe, however, that this is not the case. Searching for the term "community" in the ACL Anthology 2 returns 100 papers. However, by manually inspecting each of them, we discovered that only 9 present some sort of engagement with a community of speakers (Garcia et al., 2008;Levin, 2009;Bird et al., 2014;Everson et al., 2019;Kempton, 2017;Susarla and Challa, 2019;Conforti et al., 2 Accessed on April 30th, 2021 2020; Griscom, 2020; Le Ferrand et al., 2020). These works target endangered languages and propose technological solutions to an array of problems (e.g., archiving, documenting, or tooling). None of them presents an active and direct involvement of the communities in the design process of the suggested NLP solution. As pointed out by Bird (2020), people agency is absent and language is seen as data to be dug.
Compliance with PD methods requires for NLP to become more aware of the relationship between language data and the speakers who first produced. In this context, we advocate for a shift of paradigm, from language as data to language as people.
Mario's story exemplifies the danger of forgetting the link between NLP training data and its underlying producers: by not asking himself whether the language varieties behind GPT-3 are representative of the community he is trying to help, he ends up hurting it. The application of PD methods is a viable solution to overcome part of this predicament. The next principles will address two key steps of the development of NLP systems: data collection & annotation, and evaluation & deployment.

Community involvement is not scraping
The training of current SOTA language models (LMs) is based on large amounts of written text crawled from the Web, with no or little documentation (Bender et al., 2021). However, the attempt to calibrate a tool to the needs of a specific community demands concrete social interactions. This requires the development of ethical engagement practices based on respect, equity, and reciprocity to gain the trust of the gatekeepers of the community (Le Dantec and Fox, 2015;Hirmer et al., 2021;Bird, 2020). Gaining trust of communities is fundamental, especially when dealing with small groups of people.
In that case, all information is sensitive and often considered a currency that can be devalued once made public (Giglitto, 2017). Innovative, flexible and transparent approaches to data collection and annotation should be put in practice. In line with PD methods, the way this cannot be reduced to a check-list valid for each and every community: context-specificity, which affects participation practices, cannot be avoided (Sloan et al., 2020). Documenting, describing, explaining, and showing how the data a community makes available is processed by and used to create an NLP system is an essential step. It is up to the NLP researchers to gain trust by describing as best as they can the purpose of the work and the risks and benefits for the community. Additional advantages of designing NLP systems around the needs of a community are the possibilities of challenging existing power dynamics and also reduce risks of dual use. In this context, initiatives such as the Feminist.AI 3 collective and Indigenous data sovereignty practices (Kukutai and Taylor, 2016;Walter and Suina, 2019) are positive and innovative examples.

Never stop designing
Mario's scenario is a good example of a bottleneck in the deployment of NLP systems: in most cases, they will not fit the needs of a community and adapting them is a challenging task.
The adoption of Machine Learning techniques for developing NLP systems adopts a vision where statistical generalizations can be learned and applied to broader contexts (Sloan et al., 2020). Datasets are assumed to be good samples of language phenomena, but are actually deeply contextbound at different levels (e.g., time period, medium, population sample, among others). It is known that NLP tools struggle with tail phenomena (Ettinger et al., 2017) and are subject to bias (Bender and Friedman, 2018). Solutions are varied and focused on areas such as Domain Adaptation and Transfer Learning (Blitzer et al., 2006;Daumé III, 2007;Ma et al., 2014;Ganin and Lempitsky, 2015;Wu and Huang, 2016;Ruder and Plank, 2017;Ramponi and Plank, 2020) or de-biasing (Gonen and Goldberg, 2019;Paul Panenghat et al., 2020;Liang et al., 2020;Zhou et al., 2021).
A PD-aware NLP tool should foresee this community adaptation feature at its design stage. This requires to overcome technical (i.e., access or manipulation of the code) and resource (financial and human) predicaments as well as the use of predatory practices of users' involvement (i.e., recognize participation as labor). Having access to continuous and updated feedback from a community is paramount for ensuring that tool adaptation effectively addresses their evolving needs. In this context, researchers should put in place appropriate socio-technical solutions considering the peculiarities of the community (e.g., developing an API to report bugs might not be appropriate in areas 3 https://share.hek.ch/en/ participatory-ai-how-to-make-better-ai/ with limited internet connection). This open-ended evaluation process challenges existing industrial paradigm based on the idea of scaling.
Katie is a PhD candidate in Interaction Design working on a project on compliance to labor norms. She engages relatively small trade unions in understanding how the unions can communicate widely and effectively to the public, and to the large population of prospective new members. She has collected a variety of information, through interviews and workshops. During these activities, she has encountered two main challenges for her research: (i) she collected a large amount of textual data about labor conditions and used out-of-the-shelf NLP tools to run sentiment analysis on it; however, the tools provide predictions only in an aggregated, uninterpretable form, which prevents Katie from providing the unions with specific insights. She has also applied for funding to improve the tools' interpretability but her request has been conditionally accepted subject to changes in her research topic; (ii) although she is mindful of her role as a researcher, Katie has faced frictions when engaging with the unions as some of their members feel overtly exposed when sharing their experiences.

Text is a means rather than an end
Introducing PD methods in the design of NLP tools promotes and embraces a philosophical perspective on the interactions between humans and machines, and of Artificial Intelligence in general, as a problem-solving tool rather than as an adaptive mechanism mimicking human abilities (Winograd, 1997;Auernhammer, 2020). On the contrary, current trends in NLP are more oriented towards a rationalist perspective, attempting to develop intelligent systems that understand language (Bender and Koller, 2020).
This follows a logic of automation that attempts to ultimately remove human intervention (Crawford, 2021), reinforcing a vision of language as data. Language, however, is not a uniform entity but it adapts to the context where it is used. NLP systems have the potential to support the flow of meanings between contexts but in order to do so, and act as means rather than ends (Auger et al., 2017;Hanna et al., 2017), they must contend with the structural solidity of the categories on which its algorithms are built (Bender et al., 2021). The tools Katie uses are unable to offer insightful information to her respondents because the output is uninterpretable (i.e., why a messages has been labeled in such a way?). To see NLP technologies aligned with participatory methods and tasks demands a shift in the conceptualization of the outputs, or products, of NLP systems. The linguistic output of NLP systems should be material that triggers iterations or refinements to serve people's needs rather than imitate people's production of language.

The thin red line between consent and intrusion
Katie's scenario highlights how common it is to take for granted that the community always wants to be helped authorizing researchers to use any tool. Refusing collaboration is a risk that must be accepted thus preventing or interrupting the development of a proposed technical solution.
Importantly, the community's consent can be considered authentic only if it was proceeded by appropriate communication. When introducing a technology or a tool to a community, researchers must avoid two unethical approaches. On one hand, using terminology with which a community is not familiar with might confuse more than explain, thus potentially resulting in uninformed consent (Tekola et al., 2009). Note, however, that researchers might also find themselves in the opposite situation. When approaching (small) communities, researchers can be misled by what is called a deficit model (Irwin and Wynne, 1996), i.e., taking for granted that the reference community whom one is going to collaborate with lacks of knowledge regarding science and technology. However, people are constantly immersed in an ecology of technologies (Bødker and Klokmose, 2012) and practical knowledge to which they refer when called upon to understand something new.
To avoid misunderstandings, one must offer transparent information about the actions that will be carried out, making use of metaphors and comparisons with existing artifacts, even if the complexity of the technological architecture represent a communication challenge (Bratteteig and Verne, 2018). And always keep in mind that this dialogue can steer people's eyes in the wrong place.
9 The need to combine research goals, funding, and concrete social political dynamics All the cases observed highlight how a communitybased collaboration between NLP and PD is an issue where multiple dimensions continuously interact. In addition to this, Katie's fiction introduces an additional challenge: the need to obtain external funding to conduct her research and the interests (and requests) of the funding providers/agencies. These dynamics must take into account the goals of the researchers/designers, and of the communities involved, which cannot be completely overturned by the founders. It is evident that in this context the role of the designer/researcher becomes more and more that of an intermediary capable of translating and holding together the interests of the different stakeholders involved, without risking being co-opted and involved only in a token way (Cibin et al., 2020;Teli et al., 2020). Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021