Topic-Driven and Knowledge-Aware Transformer for Dialogue Emotion Detection

Emotion detection in dialogues is challenging as it often requires the identification of thematic topics underlying a conversation, the relevant commonsense knowledge, and the intricate transition patterns between the affective states. In this paper, we propose a Topic-Driven Knowledge-Aware Transformer to handle the challenges above. We firstly design a topic-augmented language model (LM) with an additional layer specialized for topic detection. The topic-augmented LM is then combined with commonsense statements derived from a knowledge base based on the dialogue contextual information. Finally, a transformer-based encoder-decoder architecture fuses the topical and commonsense information, and performs the emotion label sequence prediction. The model has been experimented on four datasets in dialogue emotion detection, demonstrating its superiority empirically over the existing state-of-the-art approaches. Quantitative and qualitative results show that the model can discover topics which help in distinguishing emotion categories.


Introduction
The abundance in dialogues extracted from online conversations and TV series provides unprecedented opportunity to train models for automatic emotion detection, which are important for the development of empathetic conversational agents or chat bots for psychotherapy (Hsu and Ku, 2018;Jiao et al., 2019;Zhang et al., 2019;Cao et al., 2019). However, it is challenging to capture the contextual semantics of personal experience described in one's utterance. For example, the emotion of the sentence "I just passed the exam" can be either happy or sad depending on the expectation of the subject. There are strands of works utilizing the dialogue context to enhance the utterance representation (Jiao et al., 2019;Zhang et al., 2019; Figure 1: Utterances around particular topics carry specific emotions. Utterances carrying positive (smiling face) or negative (crying face) emotions are highlighted in colour. Other utterances are labeled as 'Neutral'. In (a), utterances discussing food and restaurant are more likely carrying positive sentiment. In (b), the similar utterance, 'He was doing so well', expressed different emotions depending on its associated topic. , where influences from historical utterances were handled by recurrent units, and attention signals were further introduced to intensify the positional order of the utterances.
Despite the progress made by the aforementioned methods, detecting emotions in dialogues is however still a challenging task due to the way emotions are expressed and how the meanings of utterances vary based on the particular topic discussed, as well as the implicit knowledge shared between participants. Figure 1 gives an example of how topics and background knowledge could impact the mood of interlocutors. Normally, dialogues around specific topics carry certain language patterns (Serban et al., 2017), affecting not only the utterance's meaning, but also the particular emo-tions conveyed by specific expressions. Existing dialogue emotion detection methods did not put emphasis on modelling these holistic properties of dialogues (i.e., conversational topics and tones). Consequently, they were fundamentally limited in capturing the affective states of interlocutors related to the particular themes discussed. Besides, emotion and topic detection heavily relies on leveraging underlying commonsense knowledge shared between interlocutors. Although there have been attempts in incorporating it, such as the COSMIC (Ghosal et al., 2020), existing approaches do not perform fine-grained extraction of relevant information based on both the topics and the emotions involved.
Recently, the Transformer architecture (Vaswani et al., 2017) has empowered language models to transfer large quantities of data to low-resource domains, making it viable to discover topics in conversational texts. In this paper, we propose to add an extra layer to the pre-trained language model to model the latent topics, which is learned by fine-tuning on dialogue datasets to alleviate the data sparsity problem. Inspired by the success of Transformers, we use the Transformer Encoder-Decoder structure to perform the Seq2Seq prediction in which an emotion label sequence is predicted given an utterance sequence (i.e., each utterance is assigned with an emotion label). We posit that the dialogue emotion of the current utterance depends on the historical dialogue context and the predicted emotion label sequence for the past utterances. We leverage the attention mechanism and the gating mechanism to incorporate commonsense knowledge retrieved by different approaches. Code and trained models are released to facilitate further research 1 . To sum up, our contributions are: • We are the first to propose a topic-driven approach for dialogue emotion detection. We propose to alleviate the low-resource setting by topic-driven fine-tuning using pre-trained language models. • We utilize a pointer network and an additive attention to integrate commonsense knowledge from multiple sources and dimensions. • We develop a Transformer Encoder-Decoder structure as a replacement of the commonlyused recurrent attention neural networks for dialogue emotion detection.   by taking into account the intra-speaker dependency and relative position of the target and context within dialogues. Memory networks have been explored in (Jiao et al., 2020) to allow bidirectional influence between utterances. A similar idea has been explored by Li et al. (2020b). While the majority of works have been focusing on textual conversations, Zhong et al. (2019) enriched utterances with concept representations extracted from the ConceptNet (Speer et al., 2017). Ghosal et al. (2020) developed COSMIC which exploited ATOMIC  for the acquisition of commonsense knowledge. Different from existing approaches, we propose a topic-driven and knowledge-aware model built on a Transformer Encoder-Decoder structure for dialogue emotion detection.

Latent Variable Models for Dialogue Context
Modelling Latent variable models, normally described in their neural variational inference form named Variational Autoencoder (VAE) (Kingma and Welling, 2014), has been studied extensively to learn thematic representations of individual documents (Miao et al., 2016;Srivastava and Sutton, 2017;Rezaee and Ferraro, 2020). They have been successfully employed for dialogue generation to model thematic characteristics over dynamically evolving conversations. This line of work, which inlcudes approaches based on hierarchical recurrent VAEs (Serban et al., 2017;Park et al., 2018;Zeng et al., 2019) and conditional VAEs (Sohn et al., 2015;Shen et al., 2018;Gao et al., 2019), encodes each utterance with historical latent codes and autoregressively reconstructs the input sequence.
On the other hand, pre-trained language models are used as embedding inputs to VAE-based mod-els (Peinelt et al., 2020;Asgari-Chenaghlu et al., 2020). Recent work by Li et al. (2020a) employs BERT and GPT-2 as the encoder-decoder structure of VAE. However, these models have to be either trained from scratch or built upon pre-trained embeddings. They therefore cannot be directly applied to the low-resource setting of dialogue emotion detection.

Knowledge Base and Knowledge Retrieval
ConceptNet (Speer et al., 2017) captures commonsense concepts and relations as a semantic network, which encompasses the spatial, physical, social, temporal, and psychological aspects of everyday life. More recently,  built ATOMIC, a knowledge graph centered on events rather than entities. Owing to the expressiveness of events and ameliorated relation types, using ATOMIC achieved competitive results against human evaluation in the task of If-Then reasoning.
Alongside the development of knowledge bases, recent years have witnessed the thrive of new methods for training language models from large-scale text corpora as implicit knowledge base. As has been shown in (Petroni et al., 2019), pre-trained language models perform well in recalling relational knowledge involving triplet relations about entities. Bosselut et al. (2019) proposed COM-monsEnse Transformers (COMET) which learns to generate commonsense descriptions in natural language by fine-tuning pre-trained language models on existing commonsense knowledge bases such as ATOMIC. Compared with extractive methods, language models fine-tuned on knowledge bases have a distinctive advantage of being able to generate knowledge for unseen events, which is of great importance for tasks which require the incorporation of commonsense knowledge such as emotion detection in dialogues.

Problem Setup
A dialogue is defined as a sequence of utterances {x 1 , x 2 , . . . , x N }, which is annotated with a sequence of emotion labels {y 1 , y 2 , . . . , y N }. Our goal is to develop a model that can assign the correct label to each utterance. As for each utterance, the raw input is a token sequence, i.e., x n = {w n,1 , w n,2 , . . . , w n,Mn } where M n denotes the length of an utterance. We address this problem using the Seq2Seq framework (Sutskever et al., 2014), in which the model consecutively consumes an utterance x n and predicts the emotion label y n based on the earlier utterances and their associated predicted emotion labels. The joint probability of emotion labels for a dialogue is: It is worth mentioning that the subsequent utterances are unseen to the model at each predictive step. Learning is performed via optimizing the log-likelihoods of predicted emotion labels. The overall architecture of our proposed TOpic-Driven and Knowledge-Aware Transformer (TODKAT) is shown in Figure 2, which consists of two main components, the topic-driven language model fine tuned on dialogues, and the knowledgeaware transformer for emotion label sequence prediction for a given dialogue. In what follows, we will describe each of the components in turn.

Topic Representation Learning
We propose to insert a topic layer into an existing language model and fine-tune the pre-trained language model on the conversational text for topic representation learning. Topic models, often formulated as latent variable models, play a vital role in dialogue modeling (Serban et al., 2017) due to the explicit modeling of 'high-level syntactic features such as style and topic' (Bowman et al., 2016). Despite the tremendous success of applying topic modeling in dialogue generation (Sohn et al., 2015;Shen et al., 2018;Gao et al., 2019), there is scarce work exploiting latent variable models for dialogue emotion detection. To this end, we borrow the architecture from VHRED (Serban et al., 2017) for topic discovery, with the key modification that both the encoder RNN and decoder RNN are replaced by layers of a pre-trained language model. Furthermore, we use a transformer multi-head attention in replacement of the LSTM to model the dependence between the latent topic vectors. Unlike VHRED, we are interested in the encoder part to extract the posterior of the latent topic z, rather than the recurrent prior of z in the decoder part since the latter is intended for dialogue generation. We assume that each utterance is mapped to a latent variable encoding its internal topic, and impose a sequential dependence on the topic transitions. Figure 2a gives an overview of the VAE-based model which  aims at learning the latent topic vector during the fine-tuning of the language model. Specifically, the pre-trained language model is decomposed into two parts, the encoder and the decoder. By retaining the pre-trained weights, we transfer representations from high-resource tasks to the low-resource setting, which is the case for dialogue emotion datasets.
Encoder The training of topic discovery part of TODKAT comprises a VAE at each time step, with its latent variable dependent on the previous latent code. Each utterance is input to the VAE encoder with a recurrent hidden state, the output of which is a latent vector ideally encoding the topic discussed in the utterance. The latent vectors are tied through a recurrent hidden state to constraint a coherent topic over a single dialogue. We use LM φ to denote the network of lower layers of the language model (before the topic layer) and x L n to denote the output from LM φ given the input x n . The variational distribution for the approximation of the posterior will be: where hn−1 = fτ (zn−1, x L n−1 ), for n > 1.
Here, f µ φ (·) and f σ φ (·) are multi-layer perceptrons (MLPs), f τ can be any transition function (e.g., a recurrent unit). We employ the transformer multi-head attention with its query being the previous latent variable z n−1 , that is, We initialize h 0 = 0 and model the transition between h n−1 and h n by first generating z n from h n−1 using Eq.
Decoder The decoder network reconstructs x n from z n at each time step. We use Gaussian distributions for both the generative prior and the variational distribution. Since we want z n to be dependent on z n−1 , the prior for z n is p(z n |h n−1 ) = N z n |f µγ (h n−1 ), f σγ (h n−1 ) . where f µγ (·) and f σγ (·) are MLPs. The posterior for z n is p θ (z n |x ≤n , z <n ), which is intractable and is approximated by q φ (z n |x ≤n , z <n ) of Eq. 2. We denote the higher layers of the language model as LM θ . Then the reconstruction ofx n given z n and x L n can be expressed as: Note that this is different from dialogue generation in which an utterance is generated from the latent topic vector. Here, we aim to extract the latent topic from the current utterance and therefore train the model to reconstruct the input utterance as specified in Eq. (5). To make the combination of z n and x L n compatible for LM θ , we need to perform the latent vector injection. As in (Li et al., 2020a), we employ the "Memory" scheme that z n becomes an additional input for LM θ , that is, the input to the higher layers becomes [z n , x L n ].
Training The training objective is the Evidence Lower Bound (ELBO): Eq. 6 factorizes and the expectation term becomes and the KL term becomes N n=1 KL[q φ (z n |x ≤n , z <n )||p(z n |z <n , x <n )], (8) where p(z n |z <n , x <n ) is the prior for z n . After training, we are able to extract the topic representation from the encoder part of the model, which is denoted as z n = LM enc φ (x n ). Meanwhile, the entire language model has been fine-tuned, which is denoted as u n = LM CLS (x n ).

Knowledge-Aware Transformer
The topic-driven LM fine-tuning stage makes it possible for the LM to discover a topic representation from a given utterance. After fine-tuning, we attach the fine-tuned components to a classifier and train the classifier to predict the emotion labels. We propose to use the Transformer Encoder-Decoder structure as the classifier, and consider the incorporation of commonsense knowledge retrieved from external knowledge sources. In what follows, we first describe how to retrieve the commonsense knowledge from a knowledge source, then we present the detailed structure of the classifier.
Commonsense Knowledge Retrieval We use ATOMIC 2 as a source of external knowledge. In ATOMIC, each node is a phrase describing an event. Edges are relation types linking from one event to another. ATOMIC thus encodes triples such as event, relation type, event . There are a total of nine relation types, of which three are used: xIntent, the intention of the subject (e.g., 'to get a raise'), xReact, the reaction of the subject (e.g., 'be tired'), and oReact, the reaction of the object (e.g., 'be worried'), since they are defined as the mental states of an event .
Given an utterance x n , we can compare it with every node in the knowledge graph, and retrieve the most similar one. The method for computing the similarity between an utterance and events is SBERT (Reimers and Gurevych, 2019). We extract the top-K events, and obtain their intentions and reactions, which are denoted as {e sI n,k , e sR n,k , e oR n,k }, k = 1, . . . , K. On the other hand, there is a knowledge gen-eration model, called COMET 3 , which is trained on ATOMIC. It can take x n as input and generate the knowledge with the desired event relation types specified (e.g., xIntent, xReact or oReact). The generated knowledge can be unseen in ATOMIC since COMET is essentially a finetuned language model. We use COMET to generate the K most likely events, each with respect to the three event relation types. The produced events are denoted as {g sI n,k , g sR n,k , g oR n,k }, k = 1, . . . , K.
Knowledge Selection With the knowledge retrieved from ATOMIC, we build a pointer network (Vinyals et al., 2015) to exclusively choose the commonsense knowledge either from SBERT or COMET. The pointer network calculates the probability of choosing the candidate knowledge source as: where I(x n , e n , g n ) is an indicator function with value 1 or 0, and σ(x) = 1/(1 + exp(−x)). We envelope σ with Gumbel Softmax (Jang et al., 2017) to generate the one-hot distribution 4 . The integrated commonsense knowledge is expressed as c n = I(x n , e n , g n )e n + 1 − I(x n , e n , g n ) g n , where c n = {c sI n,k , c sR n,k , c oR n,k } K k=1 . With the knowledge source selected, we proceed to select the most informative knowledge. We design an attention mechanism (Bahdanau et al., 2015) to integrate the candidate knowledge. Recall that we have a fine-tuned language model which can calculate both the [CLS] and topic representations. Here we apply the language model to the retrieved or generated knowledge to obtain the [CLS] and the topic representation, denoted as [c n,k , z n,k ]. The attention mechanism is performed by calculating the dot product between the utter-ance and each normalized knowledge tuple: Here, we abuse c n to represent the aggregated knowledge phrases. We further aggregate c n by event relation types using a self-attention and the final event representation is denoted as c n .
Transformer Encoder-Decoder We use a Transformer encoder-decoder to map an utterance sequence to an emotion label sequence, thus allowing for modeling the transitional patterns between emotions and taking into account the historical utterances as well. Each utterance is converted to the [CLS] representation concatenated with the topic representation z n and knowledge representation c n . We enforce a masking scheme in the self-attention layer of the encoder to make the classifier predict emotions in an auto-regressive way, entailing that only the past utterances are visible to the encoder. This masking strategy, preventing the query from attending to future keys, suits better a real-world scenario in which the subsequent utterances are unseen when predicting an emotion of the current utterance. As for the decoder, the output of the previous decoder block is input as a query to the self-attention layer. The training loss for the classifier is the negative log-likelihood expressed as: log p θ (y n |u ≤n , y <n ), where θ denotes the trainable parameters.

Experimental Setup
In this section, we present the details of the datasets used, the methods for comparison, and the implementation details of our models.
Datasets We use the following datasets for experimental evaluation: DailyDialog  is collected from daily communications. It takes the Ekman's six emotion types (Ekman, 1993) as the annotation protocol, that is, it annotates an utterance with one of the six basic emotions: anger, disgust, fear, happiness, sadness, or surprise. Those showing ambiguous emotions are annotated as neutral. MELD  is constructed from scripts of 'Friends', a TV series on urban life. Same as DailyDialog, the emotion label falls into Ekman's six emotion types, or neutral. IEMOCAP  is built with subtitles from improvised videos. Its emotion labels are happy, sad, neutral, angry, excited and frustrated. EmoryNLP (Zahiri and Choi, 2018) 5 is also built with conversations from 'Friends' TV series, but with a slightly different annotation scheme in which disgust, anger and surprise become peaceful, mad and powerful, respectively.
Following Zhong et al. (2019) and Ghosal et al. (2020), the 'neutral' label of DailyDialog is not counted in the evaluation to avoid highly imbalanced classes. For MELD and EmoryNLP, we consider a dialogue as a sequence of utterances from the same scene ID.  Baselines We compare the performance of TOD-KAT with the following methods: HiGRU (Jiao et al., 2019) simply inherits the recurrent attention framework that an attention layer is placed between two GRUs to aggregate the signals from the encoder GRU and pass them to the decoder GRU. DialogueGCN (Ghosal et al., 2019) creates a graph from interactions of speakers to take into account the dialogue structure. A Graph Convolutional Network (GCN) is employed to encode the speakers. Emotion labels are predicted with the combinations of the global context and speakers' status.   KET (Zhong et al., 2019) is the first model which integrates common-sense knowledge extracted from ConceptNet and emotion information from an emotion lexicon into conversational text. A Transformer encoder is employed to handle the influence from past utterances. COSMIC (Ghosal et al., 2020) is the state-of-theart approach that leverages ATOMIC for improved emotion detection. COMET is employed in their model to retrieve the event-eccentric commonsense knowledge from ATOMIC. We modified the script 6 of language model finetuning in the Hugging Face library (Wolf et al., 2020) for the implementation of topic-driven finetuning. We use one transformer encoder layer. As for the decoder, there are N layers where N is the number of utterances in a dialogue. We refer the readers to the Appendix for the detailed settings of the proposed models.

Results and Analysis
Comparison with Baselines Experiment results of TODKAT and its ablations are reported in Table 2. HiGRU and DialogueGCN results were produced by running the code published by the authors on the four datasets. Among the baselines, COSMIC gives the best results. Our proposed TODKAT outperforms COSMIC on both MELD and EmoryNLP in weighted Avg-F1 with the improvements ranging between 3-5%. TODKAT also achieves superior result than COSMIC on DailyDi-alogue in Macro-F1 and gives nearly the same result in Micro-F1. TODKAT is inferior to COSMIC on IEMOCAP. It is however worth mentioning that COSMIC was trained with 132 instances on this dataset, while for all the other models the trainingand-validation split is 100 and 20. As such, the IEMOCAP results reported on COSMIC (Ghosal et al., 2020) are not directly comparable here. COS-MIC also incorporates the commonsense knowledge from ATOMIC but with the modified GRUs. Our proposed TODKAT, built upon the topic-driven Transformer, appears to be a more effective architecure for dialogue emotion detection. Compared with KET, the improvements are much more significant, with over 10% increase on MELD, and close to 5% gain on DailyDialog. KET is also built on the Transformer, but it considers each utterance in isolation and applies commonsense knowledge from ConceptNet. TODKAT, on the contrary, takes into account the dependency of previous utterances and their associated emotion labels for the prediction of the emotion label of the current utterance. DialogueGCN models interactions of speakers and it performs slightly better than KET. But it is significantly worse than TODKAT. It seems that topics might be more useful in capturing the dialogue context.

Ablation Study
The lower half of Table 2 presents the F1 scores with the removal of various components from TODKAT. It can be observed that with the removal of the topic component, the performance of TODKAT drops consistently across all datasets except IEMOCAP in which we ob-serve a slight increase in both weighted average F1 and Micro-F1. This might be attributed to the size of the data since IEMOCAP is the smallest dataset evaluated here, and small datasets hinder the model's capability to discover topics. Without using the commonsense knowledge ('−KB'), we observe more drastic performance drop compared to all other components, with nearly 10% drop in F1 on EmoryNLP, showing the importance of employing commonsense knowledge for dialogue emotion detection. Comparing two different ways of extracting knowledge from ATOMIC, direct retrieval using SBERT or generation using COMET, we observe mixed results. Overall, the Transformer Encoder-Decoder with a pointer network is a conciliator between the two methods, yielding a balanced performance across the datasets.

Relationships between Topics and Emotions
To investigate the effectiveness of the learned topic vectors, we perform t-SNE (Van der Maaten and Hinton, 2008) on the test set to study the relationship between the learned topic vectors and the ground-truth emotion labels. The results on Dai-lyDialog and MELD are illustrated in Figure 3(a) and (b). Latent topic vectors of utterance are used to plot the data points, whose colors indicate their ground-truth emotion labels. We can see that the majority of the topic vectors cluster into polarized groups. Few clusters are bearing a mixture of polarity, possibly due to the background topics such as greetings in the datasets.
Topics can be interpreted using the attention scores of Eq. 4. The top-10 most-attended words are selected as the representative words for each utterance. As in (Dathathri et al., 2020), we construct bag-of-words 7 that represent 141 distinct topics. Given the attended words of an utterance cluster grouped based on their latent topic representations, we label the word collection with the dominant theme name. We refer to the theme names as topics in Figure 3c. It can be observed that utterances associated with Office tend to carry 'disgust' emotions, while those related to Family are prone to be 'happy'.
We further compute the Spearman's rank-order correlation coefficient to quantitatively verify the relationship between the topic and emotion vectors. For an utterance pair, a similarity score is 7 Word lists and their corresponding theme names are crawled from https://www.enchantedlearning. com/wordlist/.  obtained separately for their corresponding topic vectors as well as their emotion vectors. We then sort the list of emotion vector pairs according to their similarity scores to check to what extent their ranking matches that of topic vector pairs, based on the Spearman's rank-order correlation coefficient. The results are 0.60, 0.58, 0.42 and 0.54 with p-values 0.01 respectively for DailyDialog, MELD, IEMOCAP and EmoryNLP, showing that there is a strong correlation between the clustering of topics and that of emotion labels. IEMOCAP has the lowest correlation score, which is inline with the results in Table 2 that the discovered latent topics did not improve the emotion classification results.

Impact of Relation Type
We investigate the impact of commonsense relation types on the performance of TODKAT. We expand the relation set to five relation types and all nine relation types, respectively. According to (Sap Dataset Table 3: Micro-F1 scores of TODKAT with more commonsense relation types retrieved from ATOMIC included for training. Here, "sE" and "oE" represent effect of subject and effect of object, respectively. "All" denotes the incorporation of all nine commonsense relation types from ATOMIC. et al., 2019), there are other relation types including {sNeed, sWant, oWant, sEffect, oEffect}, which identifies the prerequisites and post conditions of the given event, and {sAttr}, the "If-Event-Then-Persona" category of relation type that describes how the subject is perceived by others. We calculate the Micro-F1 scores of TODKAT with these two categories of relation types added step by step. From Table 3 we can conclude that the inclusion of two extra relation types or all relation types degrades the F1 scores on almost all datasets. An exception occurs on IEMOCAP where the F1 score rises by 0.5% when adding "sE" and "oE" relations, possibly due to the fact that the dataset is abundant in events. Hence the extra event descriptions offer complementary knowledge to some extent. While on other datasets neither the incorporation of "If-Event-Then-Event" nor the incorporation of "If-Event-Then-Persona" relation types could bring any benefit.

Impact of Attention Mechanism
With the knowledge retrieved from ATOMIC or generated from COMET, we are able to infer the possible intentions and reactions of the interlocutors. However, not all knowledge phrases contribute the same to the emotion of the focused utterance. We study the attention mechanism in terms of selecting the relevant knowledge. We show in Table 4 a heat map of the attention scores in Eq. 9 to illustrate how the topic-driven attention could identify the most salient phrase. The utterance 'Oh my God, you're a freak.' will be erroneously categorized as 'mad' without using the topic-driven attention (shown in the last row of Table 4). In contrast, the attention mechanism guides the model to attend to the more relevant events and thus predict the correct emotion label. Topic-Driven Attention A wants to be liked Joyful A wants to be accepted A wants to be a freak A will feel satisfied A will feel ashamed A will feel happy B will feel impressed B will feel disgusted B will feel surprised A: Oh my God, you're a freak. Mad Table 4: Illustration of the attention mechanism in Eq. 9 that helps distinguish the retrieved knowledge.

Conclusion
We have proposed a Topic-Driven and Knowledge-Aware Transformer model that incorporates topic representation and the commonsense knowledge from ATOMIC for emotion detection in dialogues.
A topic-augmented language model based on finetuning has been developed for topic extraction. Pointer network and additive attention have been explored for knowledge selection. All the novel components have been integrated into the Transformer Encoder-Decoder structure that enables Seq2Seq prediction. Empirical results demonstrate the effectiveness of the model in topic representation learning and knowledge integration, which have both boosted the performance of emotion detection.