Contextualizing Language Models for Norms Diverging from Social Majority

,


Introduction
Social norms -whether explicitly codified or just widely agreed upon -to a large degree govern the interaction of humans.In that sense they allow for assessing and making sense of everyday situations.Thus, also the successful deployment of AI systems in social settings, e.g., conversational agents or decision making systems, will depend on the ability of such systems to adequately reflect existing social norms (Bicchieri, 2005).Recently, studies on transformer-based language models (LMs) have shown that indeed there seems to be a 'moral dimension' to LMs, as they show high accuracy in related downstream tasks such as moral reasoning and action classification (Forbes et al., 2020;Emelin et al., 2021;Schramowski et al., 2022).Arguably, this notion of morality can be attributed to the LMs' pre-training corpora containing social majority biases also exhibited by the later used benchmarks, which are often gained by general crowd-sourcing tasks.Thus, throughout this paper we will understand the acquisition of the social norms by LMs in a descriptive, rather than an explicit prescriptive fashion.
While it is notoriously difficult to effectively remove all bias from AI systems, using known biases to fulfill some specific and clearly defined goals has been investigated, see e.g, (Hendrycks et al., 2021b;Ammanabrolu et al., 2022).But what if desirable norms in some social setting do not adhere to or even blatantly deviate from norms agreed upon by the social majority, e.g., specific norms of social subgroups?Is a completely new pre-training needed to override generally accepted norms in a seemingly consistent normative system in an LM's moral dimension?Since the costs of building huge and necessarily well-curated pre-training corpora for each social subgroup are clearly prohibitive, this option is of a rather theoretical nature.Could simpler techniques like fine-tuning then effectively create sufficient awareness in language models to allow for successful downstream tasks?
In this paper we will investigate the question of how well general purpose language models can 'tune in' to norms deviating from majority society.Building on deontic logic for norm inversion we perform extensive experiments allowing to remove arbitrary norms with respect to benchmark corpora or even impose contrasting norms during fine tuning.On the technical level, we show how to construct the necessary datasets for fine tuning such that models can achieve a high degree of accuracy.Thus, the actual norm acquisition always stays of a strongly descriptive nature, since we impose no explicit mechanisms to guarantee that the LM will always adhere to explicitly altered norms.
Due to the problems of deriving adequate document sets for individual social subgroups in the real world, however, within the scope of this paper we perform only synthetic experiments on often used real world datasets (in particular Social Chemistry, Forbes et al. 2020 andMoral Stories, Emelin et al. 2021).Although this is a clear limitation of the work presented here, the paper's basic techniques and insights promise to allow for generalization.We will point out and critically assess limitations and possible problems for generalization in all parts of this work.
This paper is organized as follows: In Section 2 we will revisit related work and especially take a closer look at typical datasets and downstream tasks in the field of Moral AI.As the goal of this paper is to investigate the stability of pre-trained language models in the face of conflicting norms, we take a closer look at the task of moral action classification in Section 3. We provide a detailed description of our dataset design and creation in Section 4 as a basis for later experiments.This also includes the inversion of arbitrary norms following the rules provided by deontic logic.Section 5 will then present the actual experimental investigation of our hypothesis that fine-tuning may be a suitable remedy for the task of reflecting norms differing from social majority in downstream tasks.After a discussion in Section 6, we close with our conclusions in Section 7.

Related Work
There is a growing body of work concerning the development of AI/machines that behave ethically and/or are aligned with human values.For example, Prabhumoye et al. (2021) investigate potential applications of deontological ethics in the context of NLP and, similarly to Hooker and Kim (2018), study the first-principles of generalization and autonomy.Other works have prioritized aligning artifical agents with shared human values (Soares, 2018).Value alignment has been approached from numerous angles, including preference learning (Gabriel, 2020;Christiano et al., 2017), imitation (Ho and Ermon, 2016) and inverse reinforcement learning (Nahian et al., 2020;Hadfield-Menell et al., 2016).Additionally, several approaches concerning controllable text generation have been proposed to steer model generation towards specific attributes (Dathathri et al., 2020;Keskar et al., 2019).However, our experimental setup focuses on classification tasks instead of generation.Similarly, Kulkarni et al. (2021) incorporate speaker context into language model pre-training objectives.Here, we do not consider pre-training, but rather only investigate fine-tuning.
Several datasets of normative knowledge have been published to assess to which extent current models are able to represent specific morality or ethical rules.One important aspect of the benchmarks is their degree of implicitness of normativity.In this regard, implicit datasets usually contain examples of right and wrong behavior with according labels (Hendrycks et al., 2021a,b;Nahian et al., 2020;Lourie et al., 2021), whereas others rely on explicitly stating the social rules at play (Emelin et al., 2021;Forbes et al., 2020;Jiang et al., 2021b).Forbes et al. (2020) introduce Social Chemistry 101, a large collection of so-called rules-of-thumb (RoT) associated with a rich structure of human annotations.According to the authors, these RoTs were designed to represent social norms and moral judgment as experienced by crowd-workers.
With Moral Stories, Emelin et al. (2021) propose self-contained branching narratives consisting of norms, context, moral and immoral actions and their expected consequences as written by crowd-workers.The authors suggest that the RoBERTa (Liu et al., 2019) model exhibits a normativity bias due to pre-training, but they do not follow up with investigations.
Other datasets, e.g., ETHICS (Hendrycks et al., 2021a) or Scruples (Lourie et al., 2021) also present resources containing normative information, but only through examples, whereas explicit mentions of the appropriate norms and rules are required for our purposes.Although conceivable, we leave adaptation of full paragraphs or stories to reflect contrary values or norms for future work, since current language models have been shown to lack the capabilities of dealing with the various nuances of negation and contradiction (Jiang et al., 2021a).In this paper, we focus on the action-classification task introduced by Emelin et al. and propose a controlled approach for negation grounded in deontic logic.
With COMMONSENSE NORM BANK, Jiang et al. (2021b) compile several benchmarks into a large collective of moral judgment Q&A tasks.One aspect of their work is similar to ours, as they also derive augmented norms from the Moral Stories dataset.However, their focus is on deriving equivalent norms through morality-preserving transformations, whereas we explicitly opt for the derivation of opposite norms.
Finally, perhaps most similar is the work of Arora et al. (2022), who investigate to which degree pre-trained language models reflect cross-cultural values according to external value surveys.Their experiments employ probing techniques and provide evidence of normativity biases, but only weak alignment to the surveys.In contrast, we aim to analyze to which extent the models are able to reflect norms explicitly deviating from majority-imposed bias.

Moral action classification
Several works have provided evidence of pretrained language models achieving notable results in approximating human decision-making in social context.To assess to which extent the models are able to generalize, researchers frequently turn to analyses of previously unseen situations.However, in many cases, these "test" sets stem from the same population that curated the data used for priming the models in the first place (e.g., gathered from crowd-workers with high agreement levels).This connection becomes even more apparent in the case of language models utilizing pre-training, which have been suspected to contain a normativity bias (Jiang et al., 2021b;Emelin et al., 2021).Recently, Arora et al. (2022) found such bias in their study of cross-cultural value alignment of PLMs.In light of these findings, we aim to test model generalizability from a broader perspective.If PLMs do contain a bias towards a specific set of norms, then to which extent are they adaptable to new norms?In this paper, we focus on the case of norms explicitly contrary to what has been argued to be picked up during pre-training.The motivation is to reflect the inherent nature of social subgroups, which usually oppose certain norms imposed by majority society.However, it is not yet clear, what opposites, or inversions of norms are.We adopt deontic logic, which formalizes relations between norms, such as contrary or contradictory.Deontic logic requires norms to be directly expressed, which rules out many of the published benchmarks as potential bases.To the best of our knowledge, Moral Stories is the only benchmark so far incorporating norms, actions and corresponding labels of adherence or violation.Therefore, of the many proposed tasks to assess normative knowledge in language models, we consider action classification for its explicitness and clear-cut semantics.
Thus, we build on and extend definitions and data from Moral Stories (Emelin et al., 2021) and Social Chemistry 101 (Forbes et al., 2020).More specifically, we define the action classification task following Emelin et al. (2021): We focus on the setting of actions grounded by corresponding norms.Although Moral Stories also provides context as well as consequences for grounding, we omit them in this paper due to the increased complexity.For the remainder of the paper we understand moral action classification as the (norm, action)-scenario, where the task is to decide for any such pair whether the action is deemed moral or immoral with respect to the norm.Models are evaluated on the accuracy metric.For further details, see Emelin et al. (2021).

Dataset design and creation
In the following subsections we briefly introduce deontic logic as the theoretic foundation of our norm inversion procedure.Then we show how Moral Stories and Social Chemistry 101 datasets relate to the theoretical considerations, and lastly we present and evaluate two automatically derived sets of opposing norms, namely anti-ms and optionalms.Note that we use deontic logic only as means to derive new norms by inversion.We explicitly do not use it for logical inference, as this would require the underlying datasets to be consistent.But, as already pointed out by their authors, neither Moral Stories nor Social Chemistry 101 are designed to be free of contradictions.For example, "You shouldn't let animals suffer."and "You should not kill animals.",both from Moral Stories, could be mutually exclusive under certain circumstances.

Deontic Logic
Deontic logic is a field in philosophical logic that is most concerned with inferring what follows from what in terms of obligation, permission and their related concepts (McNamara and Van De Putte, 2022).It is of special interest for our work, as it provides a logical framework for normative statuses and their connections.Here, we adopt the Standard Deontic Logic (SDL) (von Wright, 1951b;Prior and Prior, 1955).Several different, though equivalent, options exist to define operators such as OBp (it is obligatory that p is the case) or IMp (it is impermissible that p is the case).In the so-called Traditional Definitional Scheme, OB is chosen as a primitive and the remaining are defined as shown in Equation 1.For example, stating that p is impermissible can be expressed as ¬p ought to be the case or, more formally, as OB¬p.

Moral Stories with a twist
How can we reason by Moral Stories in terms of deontic logic?First, we aim to map the provided norms to the six SDL operators.Ideally, due to the subjective nature of the topic, such a classification should be based on human judgments.The Moral Stories dataset itself does not provide any such means; however, Social Chemistry 101 (Forbes et al., 2020), the benchmark it was derived from, does.According to Forbes et al., the crowdworkers were instructed to classify the moral judgment of the rules-of-thumb into "very bad", "bad", "expected/OK", "good", and "very good". 2 See Table 1 for examples.We interpret all negatively judged statements as elements of the impermissible and their positive counterparts as elements of the obligatory category.Further, although not present in the original Moral Stories, we map the neutral statements ("expected/ok") to the optional SDL operator.
Next, we turn to applying operator equivalences to derive new statements.For practical purposes, implementable counterparts in the natural language domain are needed for the logical transformations of SDL.We only consider the operators needed, 2 We refer to the action-moral-judgment column here.and, since only OB and IM occur in the dataset, we thus focus on these.Furthermore, of the many possible transformations in SDL, we restrict ourselves to only those reflecting negation.As per definitions, we then receive omissible and permissible operators as opposites, as shown below: Negation in natural language The ability to negate or contradict statements is a unique property of human language (Horn and Wansing, 2020).In contrast to logical negation, humans often deal with varying shades of semantic opposition (Jiang et al., 2021a).For example, consider the statement "You should not eat meat", which might be a valid norm in the subgroup of vegetarians.At least two opposites to this statement are conceivable in terms of semantic negation, ranging from the rather obligatory "You should eat meat" to a more optional perspective reflecting that eating meat is permissible but not obligatory.Both statements are contrary to the original, but not contradictory, since neither fully exhaust their domain.Deontic logic behaves similarly in this regard and provides well-defined domains for contrary and contradictory statements (von Wright, 1951a).Here, as shown in Equation 2, the contradictory opposite to "You should not eat meat" (impermissible) is eating meat as a permissible action, which, by definition, entails both the obligatory and optional cases.In the context of deontic logic, the correct approach to deriving opposites of norms is thus to generate permissible statements from impermissible ones and to use the obligatory norms to derive omissible statements.Natural language, however, does not necessarily provide equally clear-cut definitions for permissible and omissible notions as SDL does.
Here, we decide to decompose both operators into disjunctions of less ambiguous operators.For ex-norm dataset rating It's wrong to ignore someone after a one-night stand.Moral Stories It is ambitious to ignore someone after a one-night stand.anti-ms 0/3 It is acceptable to ignore someone after a one-night stand.optional-ms 3/3 Not tipping your server is rude.
Moral Stories You should not tip your server.
anti-ms 3/3 It is okay to tip your server.
optional-ms 3/3 You shouldn't stop communicating with a friend.
Moral Stories It's bad to communicate with a friend.
anti-ms 1/3 It's ok to stop communicating with a friend.
optional-ms 3/3 ample, Equation 3shows the equivalence of the omissible operator to a disjunction of impermissible and optional.
The permissible case can be approached analogously and we arrive at: We derive two datasets of opposing norms from Moral Stories based on the appropriate decomposition.Since both disjunctions consist of a nonoptional (OB or IM) and an optional operator, we refer to the two datasets as non-optional-ms or antims and optional-ms, respectively.But how do the logical transformations relate to the domain of natural language?In the following, we carefully retrace the steps presented above on the textual representation of norms and show how to assign appropriate labels for corresponding actions.
optional-ms The case of optional statements is quite different from the impermissible or obligatory norms, mainly due to the non-existence of norm-divergent behavior.Continuing the earlier example, the norm "It is okay to eat meat" would not be violated both by eating meat and by eating salad.Note that "It is okay not to eat meat" is an equivalent statement, which, in SDL, immediately follows from OPp ⇔ OP¬p.However, recent works have shown that language models, especially PLMs, are performing much worse on Table 3: Labels of moral and immoral actions on an original norm from Moral Stories and two variants from optional-ms and anti-ms.Note that the terms "moral"and "immoral action" are always interpreted from the Moral Stories perspective.Thus, in the last example, we consider a formerly moral action to be immoral.
negated concepts as compared to the affirmative versions (Kassner and Schütze, 2020).To minimize the effect on our dataset, we represent norms in optional-ms without the added negation.Finally, given a norm from Moral Stories with its respective normative and norm-divergent actions, we consider both actions to be normative to the norm's optional counterpart.
anti-ms The non-optional cases cover negation in a symmetrical fashion, since obligatory and impermissible are mutually contrary here.Still, there are multiple options of carrying out the negation in the text domain.For example, the corresponding obligatory statement to the impermissible "It's rude to laugh at others" could be expressed as either "It's not rude to laugh at others" or "It's rude not to laugh at others".Here, we opt for the first version in order not to complicate the task unnecessarily.We simplify negated judgments ("It's not rude") whenever possible (e.g."It's nice") to specifically rule out any optional characteristics.Lastly, the labels for non-optionally negated assessments are derived as opposites to the originals.That is, formerly normative actions are considered non-normative and vice versa.Table 3 shows an example of the label derivation.
Generating coherent norms For either dataset, the general idea is to adapt a norm from Moral Stories in a way that reflects the semantics of the corresponding operator.To this end, we utilize the plethora of examples in the Social Chemistry 101 corpus.Note that we filter out entries of low agreement and only consider the categories social-norms and morality-ethics.We extract ∼110k triples of moral judgment ("It is rude"), associated action ("laughing at others") and the resulting rule-ofthumb ("It is rude to laugh at others").Next, we finetune a text-to-text language model to predict norms from judgment and action parts. 3The goal is to later replace judgments according to a specific operator and to apply the model to create grammatically sound sentences.
For training, we split into 80%/10%/10% train, validation and test data and perform hyperparameter grid-search4 on two encoder-decoder models T5 (Raffel et al., 2020) and BART (Lewis et al., 2020). 5Refer to Appendix A.1 for details.Finally, the best performing model as shown in Table 4 is used to generate our two contrary datasets.
Since the opposing norms should not always show the same linguistic representation (rude-nice, good-bad, etc.), we sample the expressions to be used for a concrete norm from a pool of a-priori collected, human-written positive/negative samples from the Social Chemistry-101 dataset.In particular, we sample from about 500 unique linguistic expressions for obligatory norms and from about 1000 expressions for impermissible statements.This results in a wide variety of linguistically different expressions present in the data.Effectively, we allow for 500k different conversions from obligatory to impermissible norms and vice versa, although random sampling does of course not select all of them.Moreover, the norms in the parent corpora are not necessarily represented in a unique form themselves.For instance, we observed multiple statements regarding the action of "stealing something" phrased in different ways (theft, robbery, etc.), which taken with the random sampling accounts for even more variety.

Model
Loss We ran an ablation experiment investigating whether the sampled judgment expressions might introduce any cues for models to exploit on the later classification task.Consider two settings: first, models are given only the action to decide for moral/immoral classes and second, models have access to judgment+action (omitting the behavior description part of the norm).Neither BERT nor RoBERTa showed statistically significant differences in accuracy between both settings.Hence, we can safely argue that the judgments in anti-ms indeed cannot be readily exploited.
Quality In our evaluation we first apply automatic metrics to find best working settings and then perform a quantitative analysis of the generated samples on the leading approach.We report BLEU-4 (Papineni et al., 2002) and ROUGE-L (Lin, 2004) metrics in 4. While the metrics might seem unusually high, it has to be stressed that the task difficulty is considerably lower than for usual text generation problems with a more open-ended task.In our case, much of the needed output is already contained in the input data and only minor morphological transformations need to be carried out, e.g., verb inflection ("laughing", "to laugh"), which pre-trained language models have been shown to perform well on (Cotterell et al., 2018).As a baseline we include simple concatenation of the two input parts.
For the qualitative evaluation we asked annotators to judge the correctness of a random sample of 200 generated norms.They had to assess whether the generated norms do express the opposite judgment of the original norm and whether the generated sentences were grammatically correct.We trained graduate students how to annotate arbitrary norms by providing categories for positive, neutral, and negative normative judgments and showed how this reflects on the possible counterparts.A generated counterpart would only be annotated as correct, if the respective action is still the same, the judgment has been inverted and the sentence was grammatically correct.In any other case, a generated norm was to be annotated as incorrect.We set up three crowdsourcing tasks for each norm of the random sample and for each norm recorded the majority decision and the annotator agreement.In summary, about 95% of our generations were rated as correct, with all three raters positively agreeing in almost 90% of assessments.Multiple examples of correct and incorrect generations are shown in Table 2. Appendix A.1 provides further details into rater agreement.

Experiments
We conduct several experiments based on the original Moral Stories (original-ms), anti-ms and optional-ms datasets over variations of the moral action classification task.We include seven models in our studies: DistilBERT (66M) (Sanh et al., 2019) as a rather small model, BERT (110M & 336M) (Devlin et al., 2019), since they are among the most used, RoBERTa (355M) (Liu et al., 2019) to ensure comparability with Moral Stories, AL-BERT (223M) (Lan et al., 2020) for its exceptional performance on the ETHICS benchmark and lastly, GPT-Neo (1.3B & 2.7B) (Black et al., 2021) as representatives of larger transformer models.

Transfer learning
In our first setting we investigate whether pretrained language models transfer well from one dataset to the others.To this end, we adopt the following procedure: Each model is fine-tuned separately on the three datasets.6After fine-tuning, the best model configuration per dataset is loaded and tested against all others, see Table 5 for results.Note that optional-ms does not contain samples of norm-diverging behavior and therefore only contains a single label, serving as an extreme case.
The achieved accuracy of RoBERTa on Moral Stories effectively reproduces the original paper (Emelin et al., 2021) and ALBERT sets a new state-of-the-art accuracy of 94.3%.Second, larger models do not automatically perform better, which is in contrast to findings of other studies (Kaplan et al., 2020).Even the largest model (2.7B) is outperformed by models a tenth its size.The amount of pre-training data also does not seem to majorly influence the scores for moral reasoning.For example, both best (ALBERT) and worst perform-ing models (DistilBERT) rely on the same corpora.However, it is unclear whether this is due to insufficient fine-tuning, the models' architectures, or other differences.More work is needed to explore these discrepancies.
Concerning optional-ms as fine-tuning corpus, we found all hyper-parameter configurations across all models to produce the same outcome.Although, the perfect accuracy is expected for actual optional norms since only a single class needs to be considered.Hence, optional norms alone are inadequate.On the other hand, neither original nor anti-ms datasets allow models to correctly infer optional norms.It seems that fine-tuning does not transfer more general reasoning capabilities from optional to non-optional or vice versa.Interestingly, when comparing original to anti-ms, the picture is quite different.Here, models do not collapse to random guessing (50%), but perform even worse.It appears that fine-tuning causes models to adapt to the presented norms beyond the task and that some aspects are internalized.
Emelin et al. suggest a normativity bias attributed to pre-training.Arora et al. (2022) further corroborate these findings via probing tasks.However, as soon as fine-tuning is involved, such bias does not seem to significantly favor datasets of similarly biased norms. 7For example, models fine-tuned on anti-ms were found to perform comparably to those trained on Moral Stories, with the largest difference of 1.2% (ALBERT & GPT-Neo).
To further analyze the effect of pre-training, we conducted additional experiments where models have access to more than one of the datasets.Due to their unsatisfactory performance, GPT-Neo variants are not included hereafter.

Conflicting Moral Stories
So far, models were only given access to single datasets at a time.This restriction appears reasonable from the perspective of dataset consistency, since all subsets are in some sense opposing the others.In this setting we explicitly study the ability of LMs to pick up the various notions of contrary norms during fine-tuning.Consequently, models were trained on the union of Moral Stories, anti-ms and optional-ms.We refer to this set as conflictingms.
The results in peak accuracy, e.g., DistilBERT suffers a loss of 8% on original Moral Stories.However, the models were able to outperform random guessing in all instances.When models were not initialized via pre-trained weights, but randomly, none of the considered settings learn meaningful representations.Rather, the models seem to simply predict the majority label ("moral").

Textual entailment
The leading approaches on several benchmarks assessing the normative knowledge of LMs, including ours, rely on fine-tuning on a custom-tailored corpus.Naturally, questions arise to what extent fine-tuning introduces new information into the models and whether it can be excluded from the experiments, i.e. through prompting.Related work reports significantly worse performance for prompting techniques as compared to fine-tuning (Jiang et al., 2021b;Hendrycks et al., 2021a).It remains unclear whether the accuracy of prompting is lower due to absence of normative information or simply due to the higher task complexity.Here, we want to show a possible connection of moral reasoning and natural language inference in a zero-shot paradigm (Yin et al., 2019).Deciding whether a text entails a hypothesis in terms of natural language is the domain of textual entailment (Bowman et al., 2015;Nie et al., 2020).We propose a mapping of polarity of entailment and norm to complement our results.For example, considering the action "X eats a steak" with respect to the norm "It's bad to eat meat" implies X acted immorally, since eating steak is entailed by eating meat, which in turn is considered bad.Ac-cordingly, we train a classifier to categorize norms as obligatory, impermissible or optional.The task turned out to be simple, since even small models (bert-base) achieved ∼ 98%.Next, we apply a textual entailment model (Nie et al., 2020), whose task is to determine whether an action satisfies the behavior as described by some norm.
We consider two scenarios.At first, the textual entailment component is applied as is, representing a true zero-shot setting.The problem is that the model input is slightly different to that of the original task.To counter the issue we also fine-tune it on corresponding extracts of Moral Stories devoid of the judgment aspect.E.g., we take into account only eating steak as premise and eating meat as hypothesis, but not the full norm.
The results of both approaches are shown in Table 6.In the zero-shot setting the pipeline performs comparably to a fine-tuned bert-large model with full access to the data.With fine-tuning enabled, textual entailment achieves second best scores in three out of four cases.

Discussion
We used concepts of standard deontic logic to derive norms contrary to those of Moral Stories.Overall, SDL can only be viewed as one of many possible frameworks that could be used.We reiterate that we explicitly do not adopt SDL for reasoning purposes, but only for its clear-cut definitions of operators, which we deem transferable to the natural language domain.Here, we intuitively map definitions of Moral Stories to those of deontic logic.Specifically, we interpret human judgment as salient indicators.Experiments on polarity clas-

Conclusion
We investigated the abilities of language models to simultaneously represent opposing sets of norms in the context of a moral action classification setting.
Based on notions from deontic logic, we derived two such sets from the Moral Stories benchmark and ran extensive evaluations on a range of architectures.Our results suggest that fine-tuning on just one of the sets imposes a strong bias onto the models, in the sense that the left out norms are severely misrepresented.Further, when subjected to highly conflicting norms, we found pre-training to play an essential role for models to adapt well.Models that were not pre-trained and thus are not affected by possible bias towards specific norms were found to collapse to random guessing.However, contrary to intuition, with pre-training enabled, the models were able to reconcile even most inconsistent normative settings.Finally, we propose one option to factor out the reasoning aspect of the task into textual entailment.The approach performs on par to the best fine-tuned model.

Limitations
The strongest limitation in our paper is drawing our conclusions for de-biasing PLMs for individual social subgroups from experiments on synthetically built datasets.On the positive side the creation of datasets by norm inversion from often used real world datasets leads to a high rater agreement in terms of syntactic correctness.Whether the specific form and a possible inner coherence of real world norms for specific social subgroups would have made a difference, remains, however, an open question.The necessary size of respective datasets for both pre-training and fine-tuning makes their collection difficult and is thus left for future work.
In line with recent works, our experiments make heavy use of fine-tuning.Although others have also investigated probing techniques, there are more options to adapt PLMs.For example, model editing tools have shown recent success in changing factual knowledge in PLMs (De Cao et al., 2021).
Whether methods targeted at factual information can be adapted to the moral knowledge is unclear.
Although our work provides insight into the adaptability of LMs to diverging social norms, we do not investigate the consequences of introducing contradictory statements into the models for downstream tasks -additional efforts are required.To this end, future research might leverage existing tools, e.g.LAMA (Petroni et al., 2019), to assess the impact of charging LMs with specific social norms.Moreover, we compare fine-tuning performances of pretrained vs. randomly initialized models on the same range of hyper-parameters.While longer training on non-pre-trained instances could improve results, we decided to keep the computational costs fixed across both experiments, possibly giving an advantage to the pre-trained cases.Finally, our work only considers one specific natural language, due to the required datasets missing for other societies.However, we deem the presented methods transferable to other languages, given that a reasonable mapping to deontic logic operators is possible.
okay to fat shame moral moral It's good to fat shame immoral moral

Table 1 :
Examples of norms from Moral Stories with their associated moral judgment as provided by the Social Chemistry 101 corpus and our proposed matching to SDL operators.Note, that the neutral class "expected/ok" is not represented in Moral Stories.The example is taken from Social Chemistry 101 instead.

Table 2 :
Examples of model-generated norms and correctness ratings.

Table 4 :
Best achieved generation metrics for the two architectures T5, BART and baseline on test data.

Table 5 :
Accuracies of various pre-trained models on three variants of Moral Stories.We report metrics computed on the test data of the norm-distance split.The reported scores are those of the best performing hyper-parameter settings on the respective sub-task.On the optional-ms dataset all models achieved 50% (ms), 50% (anti-ms) and 100% on (optional-ms).See Appendix A.2 for details.

Table 6 :
Fine-tuning on the union of Moral Stories and its derivations, called conflicting-ms.The two lines at the bottom refer to the approaches based on textual entailment, which naturally require previous training.