Automatically Select Emotion for Response via Personality-affected Emotion Transition

To provide consistent emotional interaction with users, dialog systems should be capable to automatically select appropriate emotions for responses like humans. However, most existing works focus on rendering specified emotions in responses or empathetically respond to the emotion of users, yet the individual difference in emotion expression is overlooked. This may lead to inconsistent emotional expressions and disinterest users. To tackle this issue, we propose to equip the dialog system with personality and enable it to automatically select emotions in responses by simulating the emotion transition of humans in conversation. In detail, the emotion of the dialog system is transitioned from its preceding emotion in context. The transition is triggered by the preceding dialog context and affected by the specified personality trait. To achieve this, we first model the emotion transition in the dialog system as the variation between the preceding emotion and the response emotion in the Valence-Arousal-Dominance (VAD) emotion space. Then, we design neural networks to encode the preceding dialog context and the specified personality traits to compose the variation. Finally, the emotion for response is selected from the sum of the preceding emotion and the variation. We construct a dialog dataset with emotion and personality labels and conduct emotion prediction tasks for evaluation. Experimental results validate the effectiveness of the personality-affected emotion transition.


Introduction
Emotional intelligence can be considered a mental ability to reason validly with emotional information, and the action of emotions to enhance thought . Hence, to create dialog systems with emotional intelligence during communication, 1 Our dataset is released at: github.com/preke/PELD it is necessary to enable the machine to understand the emotion of users, select appropriate response emotions and express in conversation.
Existing works either focus on rendering specified emotions in responses (Zhou et al., 2018;Colombo et al., 2019), or understanding the emotion of users and respond empathetically (Zandie and Mahoor, 2020;Zhong et al., 2020;Lin et al., 2019); but how to automatically select the emotion for response is seldom discussed. Wei et al. (2019) proposes to learn appropriate emotional responses from massive anonymous online dialogues. However, trained on conversations from different speakers, the dialog system ignores the individual difference of expressing emotions. This may lead to inconsistent emotional interactions and disinterest users as they may feel they are still talking to rigid machines. In a dialog system, automatically selecting the emotion for response is to decide an emotion to be expressed facilitating the emotional response generation. Emotion selection can be modeled as the emotion transition (Thornton and Tamir, 2017), which refers to how the preceding emotion changes to the next, of the dialog system reacting to the dialog context. To achieve it like humans, it requires long-term patterns of thought, and behavior associated with an individual (Ball, 2000). Mehrabian (1996a) shows that the personality, e.g., the big-five personality model (Costa and McCrae, 1992) also can be represented as temperament in the Valence-Arousal-Dominance (VAD) space for emotions (Mehrabian, 1996b). 2 The finding suggests that different personalities make different impacts on emotional expressions. Inspired by these works, we propose a personality-affected emotion transition model to endow personality to the dialog system, enabling it to select emotions that react to the dialog context affected by its given personality.
In our method, we model the emotion transition of the dialog system as the variation in the VAD space from its preceding emotion to the next emotion in the response to users. We first obtain the preceding emotion of the dialog system from the dialog context and project it into the VAD space as an emotion vector. Simultaneously, we endow a personality trait, a 5-dimension vector representing the strength of each dimension in the big-five personality traits, to the dialog system. Then, we design neural networks to encode the dialog context and the personality traits into the VAD space to compose the variation of emotion. Finally, the emotion for response is selected based on the sum of the preceding emotion and the variation.
To facilitate related researches, we construct the Personality EmotionLines Dataset (PELD), which includes 6,510 dialogue triples of daily conversations with emotion labels and annotated personality traits. The emotion labels and personality annotations are adopted from other researches (Poria et al., 2018;Zahiri and Choi, 2017;Jiang et al., 2019) analyzing the script of a famous TV series Friends 3 . We conduct emotion prediction tasks on the PELD dataset to evaluate the effectiveness of our method. The results suggest that the personality-affected emotion transition does contribute to better accuracy in emotion selection. To summary, our contributions are as follows: • We raise the problem of automatically select the emotion for response in conversation and propose a new perspective to solve it through personality-affected emotion transition.
• We construct a dialog script dataset with emotion and personality labels and analyze the patterns of emotion transitions in our dataset to facilitate related researches.
• We evaluate the effectiveness of our proposed method on emotion prediction tasks and analyze the effects of personality and emotion transition respectively.

Related Works
Our research is related to the emotional dialog systems, and the personality influence on emotion ex-3 https://en.wikipedia.org/wiki/Friends pression in psychology and Human-Computer Interaction (HCI). So, we review existing works in the two aspects as follows.

Emotional Dialog Systems
The concept of the emotional dialog system first occurred in (Colby, 1975), where a rule-based emotion simulation chatbot was proposed. Microsoft introduced the Xiaoice (Zhou et al., 2020), an empathetic social chatbot that is able to recognize users' emotional needs, in 2014. Related researches become popular recently since Zhou et al. (2018) proposed the Emotional Chatting Machine to exploit the deep learning approach in building a largescale emotionally aware conversational bot. Most existing works focus on incorporating specified emotion factors into neural response generation. Shantala et al. (2018) trains emotional embeddings based on context and then integrated them into response generation. Colombo et al. (2019) controls the emotional response generation with both categorical emotion representations and continuous word representations in VAD space (Mohammad, 2018). Moreover, Asghar et al. (2018) proposes an affectively diverse beam search for decoding. Besides, reinforcement learning is also adopted to encourage response generation models to render specified emotions. Li et al. (2019) combines reinforcement learning with emotional editing constraints to generate meaningful and customizable emotional replies. (Sun et al., 2018) also uses an emotion tag to partially rewarding the model to express specified emotion. However, it is impractical to always specify response emotions for dialog systems in real application scenarios. To simulate the emotional interaction among humans, Wei et al. (2019) designs an emotion selector to learn the proper emotion for responses from massive dialogue pairs. But the emotional expression is subjective, for the same post, different users may have different emotions in their responses. So, the pattern learned only from online dialogues ignores the user information and turns to be impractical.

Personality Effects on Emotions
Emotion is a complex psychological experience of an individual's state of mind as interacting with people or environmental influences (Han et al., 2012). The Pleasure-Arousal-Dominance (PAD) (Mehrabian, 1996b) or Valence-Arousal-Dominance (VAD) emotion temperament model shows three nearly orthogonal dimensions providing a comprehensive description of emotional states. Based on this, several psychologists studied the relationship between human emotional factors and personality factors. However, most of them are rule-based models (Johns and Silverman, 2001) and probabilistic models (André et al., 1999). Mehrabian (1996a) utilized the five factors of personality (Costa and McCrae, 1992) to represent the VAD temperament model through linear regression analysis. This finding is widely used to design robots having non-verbal emotional interaction with users (Han et al., 2012;Masuyama et al., 2018), where the pre-defined personalities of robots affect their propensity of simulated emotion transitions.
To integrate the analysis above into Artificial Intelligence, some researchers in HCI borrow the idea and design facial emotional expressions for humanoid robots. Ball (2000) utilizes models of emotions and personality encoded as Bayesian networks to generate empathetic behaviors or speech responses to users in conversation. Han et al. (2012) employed five factors of personality to a 2D (pleasure-arousal) scaling model to represent a robotic emotional model. Masuyama et al. (2018) introduces an emotion-affected associative memory model for robots expressing emotions. While in NLP, though the VAD space is adopted to model emotions in some researches (Mohammad, 2018;Colombo et al., 2019;Asghar et al., 2018), the personality influence on emotion in dialogues is still an open problem.

Problem Definition
We research on enabling the dialog system to automatically select emotions for response through the personality-affected emotion transition.
Formally, a dyadic emotional conversation between the user and the dialog system contains the dialog context C = {U 1 , U 2 , ..., U n−1 } including all the preceding n − 1 utterances from both the user and the dialog system, the preceding emotion E i expressed in U i ∈ C which is the last utterance from the dialog system, and the response emotion E r for the dialog system to facilitate generating the next emotional response U n to the user. We specify a personality trait P n to the dialog system and enable it to select response emotion E r through the personality-affected emotion transition model where E r is transitioned from E i . The transition is triggered by the preceding dialog context C and affected by the specified personality trait P n . In the following content, we will introduce how we model this process in detail.

Emotions in the VAD space
Assuming in the problem above, emotions in all emotional utterances can be categorized into the six basic emotions: Anger, Disgust, Fear, Joy, Sadness, and Surprise (Ekman and Davidson, 1994). We project the basic emotions into the Valence-Arousal-Dominance (VAD) space as Table 1 refer to the analysis result in (Russell and Mehrabian, 1977) 4 . The VAD space indicates emotion intensity in three different dimensions, where the valence measures the positivity/negativity, arousal the excitement/calmness, and dominance the powerfulness/weakness. As for the utterances with no explicit emotion, we use the Neutral with (0.00, 0.00, 0.00) as the VAD vector.

Personalities in the VAD space
Meanwhile, the big-five personality traits (OCEAN, shown in Table 2) are widely used for psychological analysis. Mehrabian (1996a) (2)

Personality-affected Emotion Transition
Based on the problem definition and the preliminaries above, we design the Personality-affected Emotion Transition model as illustrated in Figure  1. Our model mainly include three modules: the personality effect on emotions in the left lower part, the context encoding in the right lower part, and the emotion transition in the top half in Figure 1. We will introduce these three modules in detail as follow.

Personality Effect on Emotions
In our model, the personality of the dialog system is specified as a 5-dimensional vector P n = [O, C, E, A, N ] representing the strength in Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism, respectively. The temperament of personality in the VAD space (shown in Equation 2) is widely used as weighting parameters for emotion transition of robots in HCI works (Han et al., 2012;Masuyama et al., 2018). However, the numeric coefficients in Equation 2 are summarized from analysis of questionnaire results from 72 participants (Mehrabian, 1996a), which are not suitable to directly adopted

Factor Description
Openness Openminded, imaginative, and sensitive.

Conscientiousness
Scrupulous, well-organized. Extraversion The tendency to experience positive emotions. Agreeableness Trusting, sympathetic, and cooperative. Neuroticism The tendency to experience psychological distress. as hyper-parameters in the model design. Hence, we choose to adopt the analysis results in Equation 2 as prior knowledge and learn suitable coefficients for personality by neural networks. First, we still calculate P V , P A , P D from the personality P n by Equation 2; then we use P V , P A , P D as initialized input for an adaptation layer A p to learn the weighting parameters P V , P A , P D that suitable for the training data.

Context Encoding
The dialog context acts as a set of parameters that may influence a person to speak an utterance while expressing a certain emotion (Poria et al., 2018).
In the VAD space, the emotion transition is regarded as the variation from one point (the preceding emotion) to another point (the next emotion). Thus, we generate the emotion transition variations ∆V, ∆A, ∆D from the semantic representations of the preceding dialog context C.
We fine-tune the pre-trained RoBERTa 5  encoder, a famous pre-trained language model whose performance is widely validated in many natural language understanding tasks, to first extract the semantic representations E n (U 1 ), ..., E n (U n−1 ) of all n − 1 utterances in C. Then, we concatenate the semantic representations of utterances to obtain the overall context semantics R c . Finally, ∆V, ∆A, ∆D are calculated by feeding R c into an affective encoder E a , which extract the affective information from R c in the aspect of V, A, D, respectively. Figure 2: A triple example in PELD. The dyadic conversation between Ross and Monica (two main roles in Friends, P n is the personality of Ross. The dialog system is set as Ross and talk with the user set as Monica in this example.

Emotion Transition
After we obtain the weighting parameters P V , P A , P D and the emotion transition variation ∆V, ∆A, ∆D, the emotion for response is generated by the sum of the VAD vectors of the preceding emotion and the weighted variation, as shown in Equation 4.
where the V i , A i , D i are the VAD vectors of E i , and the V r , A r , D r are the emotion transition results in the VAD space. To alleviate the errors of using the numeric value in calculated VAD vectors, we add a linear layer F c to transform V r , A r , D r into a probability distribution on the discrete emotion categories. The output E r is the emotion with the largest probability.

Dataset Construction & Statistics
To facilitate related researches, we construct the Personality EmotionLines Dataset (PELD), an emotional dialog dataset with personality traits for speakers. As labeling online conversation on social media with speakers' personalities is timeconsuming and may cause privacy issues, we turn to research on the dialogue script of a famous TV series Friends. This classic script is widely analyzed in many dialog researches (Li et al., 2016;Li and Choi, 2020;Jiang et al., 2019).
In PELD, each sample is represented as a dialog triple (C = {U 1 , U 2 , U 3 }, {E i , E r }, P n ), shown in Figure 2)   are emotions expressed in U 1 and U 3 , respectively. The utterances and their emotion labels are mainly adopted from the dialogues in the MELD (Poria et al., 2018) and the EmoryNLP dataset (Zahiri and Choi, 2017), two famous datasets analyzing emotional expressions in Friends. To keep consistency, each dialog triple in PELD is constructed within the same dialogue in the original datasets. The personality traits in our dataset are adopted from the personality annotations in 711 different dialogues (Jiang et al., 2019). Refer to the annotations, a role may exhibit different aspects of its personality in different dialogues. We only keep the personality traits of the six main roles in Friends for confidence as these annotations are most frequent. For each of the main roles, we average their annotated personality traits in all the dialogues by P n = 1 K K i=1 P i for simplification, where K is the number of annotations. The averaged results are shown in Table 3.
We split the PELD into Train, Valid, and Test set with portion around 8:1:1. The total number of utterances in PELD (10,648) is less than the sum of the original MELD (13,708) and the EmoryNLP (9,489), as not all dialogues are suitable to construct triples including main roles. The overall statistics of the dataset is shown in Table 4. Similar to existing emotional conversation datasets (Li et al., 2017;Busso et al., 2008), PELD also suffers the emotion imbalance issue. Utterances labeled as Neutral are the majority, while Fear and Disgust only take a small portion. Though it reflects the real emotion distribution in daily conversation, it also brings challenges to machine learning models to identify and generate emotions. We tried several automatic methods for data augmentation like synonym substitution, backtranslation, or the EDA proposed in (Wei and Zou, 2019). But most of the synthetic samples are either odd or the same as the original samples. The  reason might be there are limited options for short sentences as utterances in conversation to replace synonyms, add or delete words. Another way to alleviate the imbalance issue is to expand the granularity of emotion to sentiment. As mentioned in 3.2, in the VAD space, the Valence dimension of emotions measures the positivity and negativity, we can categorize the emotions into sentiments according to the values of Valance; i.e., positive emotions: Joy and Surprise; negative emotions: Anger, Disgust, Fear, and Sadness. Thus, the distribution of sentiments in PELD is also shown in Table 4. Besides, dialog triples of six main roles (each triple corresponds to a main role with the personality trait) are averagely distributed in all train, valid, and test sets in PELD.

Emotion Transitions in PELD
After constructed PELD, we further explore the dataset in the aspect of emotion transitions. As the triples in PELD are constructed for analyzing the emotion transitions between E i in U 1 and the E r in U 3 . Table 5 shows the emotion and sentiment distributions in the U 1 and U 3 , respectively. Be-sides, we also count the sentiments of emotions in U 1 and U 3 denoted as S 1 and S 3 . We can see that for both emotion and sentiment, the distributions in U 1 and U 3 are similar, which means the transition of emotions and sentiments are equitable in PELD triples. Besides, the proportions of all emotions and sentiments are also similar to the overall statistics of PELD, which suggests that the emotions and sentiments in PELD are also average distributed in the triples.
Since emotion transitions are affected by the personality traits as discussed above, we exhibit the emotion transition patterns for different roles with different personality traits in Figure 3. Although the emotion transitions are also correlated to the dialog context, we can still find patterns through these transition matrixes 6 .
In general, among the six transition matrixes, all the first columns are in deeper colors, which indicates most transitions occur from other emotions to Neutral as it is the majority emotion in PELD. Besides, blocks with deeper color also more likely to occur around or in the diagonals of the transition matrixes; it suggests the preceding emotions tend to transition to the same or similar emotions. As for individual roles, 0.59 of the Anger from Rachel remains the same in dialog triples, while for other roles, most Anger emotions are transferred to Neutral and Anger. Besides, most Surprise from Ross transfers to the Neutral, Joy, and Surprise, but most Surprises of the other five roles tend to transfer to only Surprise and Neutral.
Moreover, to highlight the individual differences of emotion transitions among the six main roles in detail, we also show the standard deviations (Std) of each row in the emotion transition matrixes of the six main roles, as shown in Figure 4. The red bar chart shows the Std of the infinite norms of rows in the emotion transition matrix, which indicates the diversity of the most probable emotions from the same emotion in emotion transfers of different roles. While the blue bar chart shows the Std of the L2-norms, which generally describes the difference in how different roles transfer from one emotion to other emotions.
Both charts show similar patterns of emotion transitions. Anger, Surprise, and Disgust vary the most in different roles, while people are more common when process Neutral and Joy emotions in conversation. Besides, negative emotions (Anger, Sadness, Fear, and Disgust) are relatively higher than positive emotions and Neutral on average. So, we can infer that the personality traits influence more in the emotion transfers from negative emotions.

Evaluation Tasks
To validate the effectiveness of our proposed emotion generation model, we set two evaluation tasks: Emotion Prediction and Sentiment Prediction on PELD. Emotion Prediction requires the model to predict the emotion in the upcoming utterance based on the preceding dialog context in a dyadic conversation scenario; while Sentiment Prediction has the same setting except to predict the sentiment in the upcoming utterance.
For both tasks, we evaluate the prediction per-  formance by F-scores of single emotion or sentiment. Besides, the overall performance is also measured from two aspects with the macro averaged (m-avg) and the weighted averaged (w-avg) F-scores. A higher m-avg indicates the model performs relatively better predicting all categories, while a higher w-avg indicates the model predicts emotions or sentiments with larger proportions in the dataset better.

Ablation Study Setting
Although plenty methods Ghosal et al., 2020Ghosal et al., , 2019 has been proposed to analyze emotions in dialogues of Friends, most of their targets are to recognize the emotions of utterances in conversation. Compared with emotion recognition, the problem setting of selecting emotion is different and it is more difficult to select the appropriate emotion in response without knowing the response content. So, instead of comparing with other emotion recognition models, we turn to conduct ablation studies to evaluate the effectiveness of different parts of our model design.
The ablation study compares the performances of the following models:   is widely validated in many downstream tasks. We here use pre-trained RoBERTa, corresponding to the E n in our model, to encode the preceding dialog context to obtain the semantic representation as input, then directly predict the emotion for response through a classification head.

RoBERTa-P:
We concatenate the personality vector of the speakers with the dialog context representation by RoBERTa as the feature, then predict the response emotion. This method is to evaluate whether personality influences the expression of emotions.
PET-VAD: As emotions can be represented by both discrete category labels or vectors in the VAD spaces. PET-VAD is set to compare the different usages of emotion VAD vectors in our model. During training, PET-VAD regressions the VAD vectors of target emotions by minimizing the Mean Squared Error (MSE) between generated vectors and the VAD vectors of ground truth emotions. The prediction output of PET-VAD is the closest neighbor emotions of generated VAD vectors measured by MSE.
PET-CLS: This is our method Personality-affected Emotion Transition with a classifier after obtaining the VAD vector of generated emotion. PET-CLS predicts emotions in the upcoming utterances as described in Section 3.
For RoBERTa, RoBERTa-P, and PET-CLS directly outputting discrete emotions, we adopt the Focal loss (Lin et al., 2017) to relieve the imbalanced emotion prediction.

Results and Analysis
In this section, we report and analyze the experimental results on the Test set of PELD in our ablation study. All results are chosen by the best performance on the Valid set within 50 epochs training.

Results for Emotion Prediction
The results on the Emotion Prediction task are reported in Table 6. First of all, as a seven-classes prediction task also suffered from the imbalance issue, the overall performance is moderately low, which also indicates the difficulty of the task. As for the averaged F-scores, PET-CLS improves both the wavg and m-avg by a large margin from all other methods, which verifies our personality-affected emotion transition method.
In detail, all models perform better on emotions with larger portions (Neutral and Joy), as they are more probable to occur in the response emotion. Moreover, PET-VAD and PET-CLS achieve moderately higher F-score on the minority emotions (Anger, Sadness, Disgust, Fear, and Surprise), which shows that the emotion transition process is more important generating these minority emotions. It also verifies the finding in Section 4.2.  On the other hand, although PET-VAD is based on the designed personality-affected emotion transition, most single emotion F-scores of PET-VAD are lower than RoBERTa or RoBERTa-P. We discuss the possible reasons as follows. One reason might be that the imbalance emotion issue cannot be alleviated in directly regression the emotion VAD vectors. Another reason might be that the value of emotion VAD vectors in Table 1 are estimated rather than precisely calculated, and the distance among different emotions in the theoretical VAD space is not similar to those in the emotion distribution in daily conversation.

Results for Sentiment Prediction
As predicting the emotions for the upcoming responses is difficult due to the multiple imbalanced categories, we also report the results on the Sentiment Prediction task in Table 7. Besides, different from the analysis above, which categorizes emotions by their portions in PELD, sentiment is another aspect of emotion analysis. As the sentiments are not directly described in the VAD spaces, we only report the results for RoBERTa, RoBERTa-P, and the PET-CLS. Besides, we only change the output size of PET-CLS from 7 (for emotions) to (3 for sentiments) and preserve the emotion transition process in this task.
In general, we can see that the prediction Fscores of sentiments are higher than emotion predictions. Besides, the prediction of negative emotions is much easier than predicting positive emotions in all three methods. It may because although the numbers of sentiments are similar, the categories of negative emotions (Anger, Sadness, Fear, and Disgust) are more than positive emotions (Joy and Surprise). Equipped with our model design, PET-CLS outperforms both RoBERTa and RoBERTa-P excepted for the neutral sentiment. It suggests that the personality-affected emotion transitions also facilitate sentiment prediction. However, only concatenating the personality vectors with context representation, RoBERTa-P improves the F-scores of Neutral but decreases the Positive and Negative. Hence, direct concatenation limits the effect of personality information in sentiment prediction.

Conclusion and Future Work
In this work, we raise the problem of automatically selecting the emotion for response considering the individual differences in conversation and propose a new perspective to solve it through personalityaffected emotion transition. Besides, we construct a dialog script dataset PELD with emotion and personality labels to facilitate related researches. We also validate our personality-affected emotion transition model in emotion prediction experiments.
Facial expressions, voices, gestures, and environment information are also vital in emotional interaction, but they are not captured in the purely text-based dialog systems. Besides, as seen from statistics in PELD, the most common emotion in the dialog scripts is still Neutral. One possible reason is that other subtle affective information is not captured in the text. Therefore, our future works will continue to investigate the personality effects on emotions in the multi-modality scenario.

Acknowledgement
This work is supported by the Hong Kong RGC Collaborative Research Fund with project code C6030-18G and Hong Kong Red Swastika Society Tai Po Secondary School with project code P20-0021.