Experiencer-Specific Emotion and Appraisal Prediction

Emotion classification in NLP assigns emotions to texts, such as sentences or paragraphs. With texts like “I felt guilty when he cried”, focusing on the sentence level disregards the standpoint of each participant in the situation: the writer (“I”) and the other entity (“he”) could in fact have different affective states. The emotions of different entities have been considered only partially in emotion semantic role labeling, a task that relates semantic roles to emotion cue words. Proposing a related task, we narrow the focus on the experiencers of events, and assign an emotion (if any holds) to each of them. To this end, we represent each emotion both categorically and with appraisal variables, as a psychological access to explaining why a person develops a particular emotion. On an event description corpus, our experiencer-aware models of emotions and appraisals outperform the experiencer-agnostic baselines, showing that disregarding event participants is an oversimplification for the emotion detection task.


Introduction
Computational emotion analysis from text includes various subtasks, with the most prominent one being emotion classification or regression.Its goal is to assign an emotion representation to textual units, and the way this is done typically depends on the domain of the data, the practical application of the task, and the psychological theories of reference: emotions can be modelled as discrete labels, in line with theories of basic emotions (Ekman, 1992;Plutchik, 2001), as valence-arousal value pairs that define an affect vector space where to situate emotion concepts (illustrated, e.g., by Posner et al., 2005), or as appraisal spaces that correspond to the cognitive evaluative dimensions underlying emotions1 (Scherer, 2005;Smith and Ellsworth, 1985).
Irrespective of the adopted representations, most work in the field detects emotions from a single perspective -either to recover the emotion that the writer of a text likely expressed (e.g., with respect to emotion categories and intensities (Mohammad et al., 2018), and cognitive categories (Hofmann et al., 2020)), or to predict the emotion that the text elicits in the readers (e.g., using news articles, Strapparava and Mihalcea, 2007;Bostan et al., 2020).Only a few approaches combine or compare the reader's with the writer's perspective (Buechel and Hahn, 2017, i.a.).However, none of them looks at the perspectives of the participants in events (both mentioned or implicit) as described by a text.
Focusing on such perspectives separately is essential to develop an all-round account of the affective implications that events have.It would emphasize how the facts depicted in text are amenable to different "emotion narratives", by pushing one or the other perspective in the foreground.For instance, a possible interpretation for the sentence "As the waiter yelled at her, the expression on my mother's face made all the staff look repulsed", could be: "my mother"→sadness, "the waiter"→anger, and "the staff"→disgust.There, one entity is responsible for an event (screaming), one is influenced by it, and the third is affected by the emotion emerging in the other (the facial expression, which can be seen as an event in itself).
Our goal is close to emotion role labeling, a special case of semantic role labeling (SRL) (Mohammad et al., 2018;Kim and Klinger, 2018).SRL addresses the question "Who did What to Whom, Where, When, and How?" (Gildea and Jurafsky, 2000), emotion SRL asks "Who feels what, why, and towards whom?" (Kim and Klinger, 2018), mainly to detect causes of emotion-eliciting events (Ghazi et al., 2015) for certain entities.Here, we tackle a variation of this question, namely, "Who feels what and under which circumstances?".The circumstances refer to the explanation pro-arXiv:2210.12078v2[cs.CL] 26 May 2023 vided by appraisal interpretations, another novelty that we contribute to the emotion SRL panorama.Appraisal-based emotion representations capture entity-specific aspects that lead to an emotion, as they describe the subjective qualities that an individual sees in events.
We propose a method for experiencer-specific emotion and appraisal analysis that bridges emotion classification and semantic role labeling.Given texts that describe events and that include annotations for all participants, we assign an emotion and an appraisal vector to each potential emoter.Our proposal is computationally simpler than creating a full graph of relations between causes and entities, as is normally done in (emotion) SRL.Yet, its fine-grained focus on event participants is beneficial over traditional classification-and regressionbased approaches: by predicting an emotion and scoring multiple appraisals for each entity, our model strongly outperforms text-level baselines.Thus, the results demonstrate that assigning one emotion to the entire instance, or multiple emotions without considering for whom they hold, is a simplification of the emotional import of the text.

Related Work
In natural language processing, emotions are usually represented as discrete names following theories of basic emotions (Ekman, 1992;Plutchik, 2001), or as values of valence and arousal (Russell and Mehrabian, 1977).Computational models based on such representations have been applied to many text sources, including Reddit comments (Demszky et al., 2020) and tales (Alm et al., 2005), but also to resources created as part of psychological research.An example is the ISEAR corpus.It consists of short reports collected in lab (Scherer and Wallbott, 1997), instructing participants to describe events that caused in them a certain emotion.A similar collection practice was adopted by Troiano et al. (2019).In their enISEAR, crowdworkers completed sentences like "I felt [EMOTION NAME] when . . ." for seven emotion names.
The gap between entity-specific emotion analysis and emotion SRL was partially filled in by Troiano et al. (2022).They aimed at better understanding the readers' attempts to interpret the experience of the texts' authors.They post-annotated instances from enISEAR with emotions and 22 appraisal concepts, both for the writer and all other event participants mentioned in the text.The appraisal variables include evaluations of events, as they were likely conducted by the event experiencers, including if authors felt responsible, if they needed to pay attention to the environment, whether they found themselves in control of the situation, and its pleasantness (see Table 1 in their paper for explanations of the variables).However, their work was limited to corpus creation and analysis, and did not provide any modeling of appraisals or emotions in an emotion experiencer-specific manner.Therefore, it is not clear whether a simplifying assumption that all entities experience the same emotion or an actual entity-specific model performs practically better.We address this concern and show that experiencer-specific modeling is beneficial.
Finally, our work is related to structured sentiment analysis (Barnes et al., 2021), in which opinion targets, their polarity, but also an opinionholding (or expressing) entity is to be detected.Most studies focused on sentiment targets and aspects (Brauwers and Frasincar, 2021), but there are also some that aim at detecting the opinion holder (Kim and Hovy, 2006;Wiegand and Klakow, 2011;Seki, 2007;Wiegand and Klakow, 2012, i.a.).To predict a e and E e for each experiencer e with the help of t e , we use as input a positional indicatorencoding of the experiencers in context (inspired by Zhou et al., 2016).The writer is encoded with an additional special token t o = WRITER.We refer to this experiencer-specific model as EXP.
Baseline.We compare this model to a baseline in which we simplify the experiencer-specific classification as text-level classification.During training, we assign the text t the union of all emotion labels of all contained experiencers, namely E t = e,te=t E e .Analogously, the aggregation of the appraisal vectors is the centroid of all experiencers in one text: a t = 1 |{e|te=t}| e,te=t a e .We refer to this baseline model as TEXT(-based prediction).Table 2 examplifies the input representations.
Data Preparation.We use the x-enVENT data set (Troiano et al., 2022) for our experiments.It consists of 720 event descriptions, mainly from the enISEAR corpus (Troiano et al., 2019), which we split into 612 instances for training and 108 instances for testing (stratified).Each text has been annotated by four annotators and adjudicated to span-based experiencer annotations with a multilabel emotion classification and an appraisal vector.We merge infrequent emotion classes from the original corpus.Table 1 shows the label distribution.Implementation.We fine-tune Distil-RoBERTa (Liu et al., 2019) based on the Hugging Face implementation (Wolf et al., 2020).For both the emotion classification and the appraisal regression tasks, we follow a multi-task learning scheme.All emotion categories are predicted jointly by one model with a multi-output classification head, analogously with a regression head for the appraisal vector.The appendix contains implementation details.2 Evaluation.We evaluate performance by calculating experiencer-specific F 1 scores for emotion classification and Spearman's ρ for appraisal regression.In the TEXT baseline, we project the decision for the text to each experiencer that it contains.

Results
Quantitative Evaluation.Tables 3 and 4 show the results.For emotion classification, we report precision, recall, and F 1 measures for the baseline TEXT and the experiencer-specific predictions by EXP in Table 3. EXP substantially outperforms TEXT in terms of F 1 score.This trend holds across all emotion categories, as a result of an increased precision, which is intuitively reasonable, because the EXP model learns to distribute the emotions that are contained in a text to individual experiencers, while the TEXT baseline distributes all emotions to all experiencers equally, leading to an increased recall.The most substantial improvements are observed for anger (+14), sadness (+12) and shame (+12) as well as for no emotion (+20).These results are in line with the corpus analysis by Troiano  (Troiano et al., 2022, Figure 4).The performance increase for joy, fear and disgust is less distinct: these emotions are likely shared by all event experiencers.For the appraisal predictions, we report Spearman's ρ in Table 4.We observe an improved performance prediction across nearly all dimensions.Appraisals that distinguish between who caused the event and who had the power to influence it (self vs. other) show the most substantial improvement, namely self responsibility (+0.37) and self control (+0.28), as well as other responsibility (+0.35) and other control (+0.28).This is reasonable -the self and other are often mutually exclusive.This interaction of appraisals cannot be exploited by purely text-level prediction models.However, if an event is caused by external factors, like situational responsibility (+.09) and situational control (+.04), all experiencers are equally affected by it.The decrease in performance for external check (−0.12) might be explained by the fact that this dimension is often shared between experiencers, rendering the TEXT model sufficiently efficient.
Analysis.We show some examples in Table 5 that highlight the usefulness of EXP over TEXT.Next to the emotion classification annotations and predictions from both models, we show the appraisals of self responsibility/other responsibility and self control/other control.In each example, the writer is one emotion experiencer.All other experiencers are underlined.
We observe that the TEXT model has a tendency to predict the union of the emotions for all experiencers, but sometimes predicts more additional categories.This is a consequence of the tendency towards high recall predictions of this model.In Example 1, both EXP and TEXT correctly assign the emotions anger, disgust and no emotion, but only EXP distributes them correctly between "Writer" and "The owners" (sadness is wrongly detected by both models).In Example 2, joy is not predicted by TEXT, but correctly assigned to "a group of children" by EXP.EXP further distributes shame and sadness to the correct entities (with a mistake assigning anger and no emotion to "a group of children" as well as anger and fear to "another child").In Example 3, EXP correctly assigns sadness and shame to "Writer" and sadness and no emotion to "my sister", while TEXT fails to detect no emotion.In Example 4, EXP's prediction of anger and fear (for "our children") could be accepted to be correct despite it not being in line with the gold annotation.EXP further predicts the correct emotions for "Writer" (but makes a mistake assigning joy to "my ex husband").In Example 5, the emotions of "Writer" are correctly assigned; "my son" is wrongly assigned joy in addition to no emotion (TEXT mistakenly predicts other as well).However, the correctness of this annotation is debatable.
Maximal values for the gold appraisal values for self/other control and self/other responsibility are, in nearly all cases, mutually exclusive across experiencers.The TEXT model is not informed about that and distributes the values across all entities.The EXP model does indeed recover the individual values for the appraisals, but to varying degrees.In Examples 2, 3, and 4, nearly all experiencers receive appraisal values close to the gold annotations.Example 2 appears to be challenging: the writer has a high gold annotation value for self responsibility which is not automatically detected.Further, "a  group of children" receives the same values for the four appraisals.Examples 1/5 are cases in which the appraisal prediction does not work as expected.

Discussion and Conclusion
We presented the first approach of experiencerspecific emotion classification and appraisal regression.Our evaluation on event descriptions shows the need for such methods, and that a text-instance level annotation is a simplification.This work provides the foundation for future research focused on texts in which multiple emotion labels co-occur, including reader/writer combinations or turn-taking dialogues.We propose to integrate experiencer-specific emotion modeling within such settings, for instance in novels, or news articles.It can also enrich the work of emotion recognition in dialogues (Poria et al., 2019): Chains of emotions have been modeled, but not considering mentioned entities.
Our work focused on a corpus that has been annotated specifically for writers' and entities' emotions.There exist, however, also other corpora with experiencer-specific emotion annotations, namely emotion role labeling resources (Kim and Klinger, 2018;Bostan et al., 2020;Campagnano et al., 2022;Mohammad et al., 2014).In addition to other information, they also provide experiencerspecific emotion labels, though not in such an eventfocused context.Still, modeling them following our method needs to be compared to more traditional approaches that aim at recovering the full role labeling graph.
Our approach to encoding the experiencer position in the classifier has been a straightforward choice.Other model architectures (including positional embeddings, Wang and Chen, 2020) might perform better.Another interesting methodological avenue is to model the predictions of multiple experiencers jointly to exploit their relations.
Finally, an open question is how to incorporate information from existing resources that are not labeled with experiencer-specific information.For instance, Troiano et al. (2023) provide appraisal and emotion annotations for many more instances that might be beneficial in a transfer-learning setup.
bidirectional long short-term memory networks for relation classification.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 207-212, Berlin, Germany.Association for Computational Linguistics.
A Implementation Details.
We fine-tune Distil-RoBERTa (Liu et al., 2019) as implemented in the Hugging Face library3 (Wolf et al., 2020) and leave default parameters unchanged.For both the emotion classification and the appraisal regression tasks, we follow a multitask learning scheme.All emotion categories are predicted jointly by one model with a multi-output classification head, analogously with a regression head for the appraisal vector prediction.The classification head consists of a linear layer with dropout (0.5) and ReLU activation function, followed by a final linear layer with sigmoid activation.For the appraisal regression, the sigmoid activation function in the final layer is replaced by a linear activation.We use binary cross entropy loss in the emotion classifier and mean squared error loss in the appraisal regressor.Both models are trained for 10 epochs without early stopping.We use the Adam optimizer (Kingma and Ba, 2015) with weight decay (0.001) and a learning rate of 2 • 10 −5 .The weights of each layer are initialized using the Xavier uniform initialization (Glorot and Bengio, 2010).The hyperparameters and architecture have been decided on via 10-fold cross validation on the training data.

Table 1 :
Number of instances and experiencer spans annotated for each emotion.Non-bold emotion names are concepts in the x-enVENT data that we merge with bold emotion names in our experiments.

Table 2 :
Example representation at training time for the EXP model and the TEXT baseline for the instance "WRITER I felt bad for not being there for him".

Table 3 :
Emotion classification results of the TEXTbased baseline which is not informed about experiencerspecific emotions with our emotion experiencer-specific model EXP.

Table 4 :
Appraisal regression results of the TEXTbased baseline and the experiencer-specific model EXP.The average has been calculated via FisherZ-Transformation.
et al.(2022).They found that some emotions are often shared between different experiencers within one text, but others occur in common pairs, namely guilt-anger, no emotion-sadness, guilt-sadness and shame-anger.Noteworthy is the category no emotion, which commonly occurs with all other emotions . . .working in the street seeing faeces of dogs.The owners should take care of them but are being so lazy and neglected, that is terrible.2 I felt . . .when I remember being part of a group of children at school who verbally bullied another child.

Table 5 :
Examples of EXP and TEXT predictions.a:anger, d: disgust, no:no emotion, o:other, sa: sadness, sh: shame, f: fear, j: joy.The boxes show the appraisal self responsibility, other responsibility, self control, other control, with values between and .