Jointly Identifying Rhetoric and Implicit Emotions via Multi-Task Learning

Rhetorical implicit emotion identiﬁcation is one of important and challenging tasks in nat-ural language processing. We observe that each rhetoric may express certain evidence of semantic and syntactic patterns. Then, we design a gate mechanism based classiﬁcation module to capture respective rhetorical representation and identify each rhetoric. More-over, sentences carved with rhetoric tends to express emotions in subtle ways. We thus pro-pose a new multi-task learning framework that can encode the categorical correlation between tasks to improve the performance of rhetoric and emotion identiﬁcation problem. Experimental results validate the beneﬁt of the proposed model over state-of-the-art baselines for rhetoric and emotion identiﬁcation tasks. In addition, a new Chinese rhetorical implicit emotion dataset was constructed and will be released in this work.


Introduction
Rhetoric is formed by decorating the semantics and adjusting the structure of sentences in text (Kennedy, 2009;Wang, 2013). The sentences that express implicit emotions via rhetorics are commonly used in literary works and spiritual user reviews. They tend to express semantics at a high cognitive level and convey emotions in an obscure way. The example sentence below actually expresses the emotion of "love" and meanwhile consists of two rhetoric forms of parallelism and simile. Tolerance is a ribbon, which can tie the friendship * Corresponding author: Suge Wang. between people; Tolerance is a flame, which can melt the barriers between people.) Clearly, one can understand and predict the emotion ("love") and rhetoric ("simile" and "parallelism") of the sentence by capturing the semantic meanings of content words ("tolerance", "board", "bridge", "ribbon", "friendship") as well as the syntactic structure and pattern of parallelism clauses in the sentence ("Tolerance is …, which can…"). Semantics and syntax are pivotal cornerstones for rhetoric, and are attached to different rhetoric. Moreover, positive emotions are more likely to be expressed by rhetorical categories such as simile, parallelism and so on, while negative emotions are more commonly conveyed by the rhetorical categories such as rhetorical questions, irony and so on.
Sentences carved with rhetoric can convey emotions in an implicit way, without explicitly using emotional words, which pose new challenges for textual understanding and rhetorical emotion analysis in natural language processing.
Various efforts have been made to cope with emotion identification (Delbrouck et al., 2020;Chen et al., 2018;Klinger et al., 2018) and rhetoric detection (Liu et al., 2018;Wen et al., 2019;Mao et al., 2019). However, constructing heuristic rules and syntactical patterns for rhetoric identification is time-consuming and labor-intensive, and it may perhaps result in poor generalization to develop a specific detection model for each rhetoric category (Liu et al., 2018;Mao et al., 2019). To our knowledge, the correlation between rhetoric and emotion identification tasks has not been exploited in previous work.
In this work, we aim to cope with rhetoric and emotion analysis. Following previous studies, we formulate each of the rhetoric and emotion identification as a multi-label classification problem.
Then, we propose a new multi-task learning model called REI-MUL, which can jointly leverage the task correlation to address the rhetoric and emotion identification in a unified framework.
In particular, a pre-trained language model (Devlin et al., 2019) and a tree-structured long shortterm memory networks (Tree-LSTMs) (Tai et al., 2015) are used to encode the semantic and syntactic representations of sentences, respectively. Taking the representations as input, two correlation modules are introduced to learn the categorical correlations between rhetoric and emotion tasks. One key benefit of REI-MUL is that it exploits the correlation between the two tasks, and may allow better learning of model parameters from real-life natural language data.

Overview
Formally, let C be a labeled set of sentences, .., y r l R } and Y e l = {y e l 1 , y e l 2 , ..., y e l E } denote the rhetoric label set (with R categories) and the emotion label set (with E categories) of the sentence S l , respectively. Note that y r l i , y e l j ∈ {0, 1}, y r l i = 1 and y e l j = 1 mean the S l is labeled as the i-th rhetoric and j-th emotion categories.
Based on the labeled set C, we develop a new multi-task learning model (REI-MUL), as shown in Figure 1, which can exploit task correlation to deal with rhetoric identification and emotion recognition in a unified framework.

Semantic Representation
As one of the state-of-the-art language representation models, the BERT model (Devlin et al., 2019) is employed to learn the sematic representations of sentences. Specifically, the contextualized representation of [CLS] token is used as the encoding of whole sequence, which is denoted by sr sem ∈ R d 1 . For simplicity, we omit the subscript of input sentence "l" for subsequent sections.

Syntactic Representation
A well-known Tree-LSTMs (Tai et al., 2015) is exploited to encode syntactic representations of sentences. Word sense is a critical factor for structural representation. We thus represent the meaning of a word via connecting the comprehensive embedding   and the sememes knowledge embedding from SE-WRL Model (Niu et al., 2017).
Taking the word representations as input, the hidden state of root node in Tree-LSTMs model is taken as syntactic representation of each input sentence, as shown in Eqn (1).
where X = {x 1 , x 2 , ..., x N }, x i means the input embedding of each word w i , X syn is the dependency parsing of S by using Stanford Parser, and sr syn ∈ R d 2 .

Correlation Layer
Both rhetorical and emotional modules of correlation layer are designed to obtain the label distributions of sentences, which can help to explicitly encode the semantic and syntactic correlations between rhetoric and emotion identification tasks.

Rhetoric Distribution
A gate mechanism (Cho et al., 2014) based classification model is proposed for each type of rhetoric, which assigns different weights to semantic and syntactic representations. Taking the i-th rhetoric identification as an example, the classifier can derive three types of representations, i.e., sentence representation, feature representation, and rhetoric distribution via Eqn (2)-(7).
https://nlp.stanford.edu/software/lex-parser.html ("version 4.0.0, Chinese") where r i , z i and sr r i ∈ R d 2 denote the reset gate, update gate and sentence representation, respectively. σ and tanh denote the sigmoid and hyperbolic tangent activation functions, and is elementwise multiplication.f where pp r i refers to rhetoric distribution for i-th rhetoric identification, and pp r i ∈ R 1 . Next, the respective R rhetoric distributions are concatenated as one comprehensive rhetoric distribution via Eqn (8), which would be used to compute the emotion distribution.
where pp r ∈ R R .

Emotion Distribution
The emotional features are constructed by concatenating the semantic representation and rhetorical distribution of S. Taking the features as input, a classification model can be adopted to compute emotional distribution via Eqn (10), which would be next used to improve the rhetorical prediction task.
pp e = sr e W pe where pp e denotes emotional distribution of input sentence S, and pp e ∈ R E .

Rhetoric Prediction Layer
An emotional distribution is incorporated into original rhetorical features, which is extracted from sentence representations. Then a sigmoid classifier is designed to take the correlated representation of a sentence as input, and predicts rhetorical probability scores. Formally, Eqn (11)-(13) calculate the transformed features based on emotional distribution (f e→r ), new correlated features (f r i ) and predicted probability (p r i ), respectively, for i-th rhetoric identification task.
where W e→r represents the feature transformation matrix of emotional distribution. Next, the R predicted scores are concatenated as one rhetorical predicted probability distribution p r ∈ R R .

Emotion Prediction Layer
Similarly, a new correlated representation of input sentence is computed based on the predicted rhetorical distribution and semantic representation of the sentence via Eqn (14). And then we predict probability scores of emotion by a sigmoid classifier via Eqn (15).
p e = σ(sr e W pe ) wherep e ∈ R E denotes the emotional prediction probabilities of input sentence.

Data
To our knowledge, there is a labeled dataset for emotion analysis of metaphorical expressions, which was constructed from a cognitive process . However, there are no datasets available for joint rhetoric and emotion identification problem from the perspective of linguistics. We constructed the first dataset in Chinese, where each sentence may contain multiple rhetoric/emotion labels.   various sources such as literary works, textbooks, microblog, and websites. Three graduate students were hired to annotate the rhetorical and emotional labels of each sentence, where the inter-rater kappa coefficients are 0.848, 0.692, 0.757 for rhetoric annotation task and 0.458, 0.512, and 0.556 for emotion task, respectively. We randomly selected 80% of the annotated dataset as training data, while the rest 20% was equally divided as testing (test) and development (dev) data. All the models were evaluated on the same data split. Table 2 shows the correlation statistics between rhetorical and emotional categories on the dataset. For example, the emotion "love" of a sentence is more likely to be expressed by rhetorical categories such as simile, parallelism and personification, while the emotion "disgust" is more commonly conveyed by the rhetorical categories such as rhetorical questions and irony.

Comparison Systems
We compared the proposed models with the following well-established baselines and ablation models.
• CNN-Adversarial-MUL:  proposed an adversarial multi-task learning framework for text classification, alleviating the shared and private latent feature spaces from interfering with each other.
• BERT-MUL: Liu et al. (2019Liu et al. ( , 2020 presented a multi-task deep neural network for learning representations across multiple natural language understanding tasks.
• RI-Single, EI-Single: the proposed model is simplified a single task model for rhetoric or emotion identification.
• w/o RheFusing, w/o EmoFusing: the proposed model without predicted rhetorical or emotional distribution.
• w/o Gate: the proposed model without a gate mechanism.
• w/o Tree: the proposed model without syntactic representation.
• w/o ComprehensiveEmb, w/o Knowl-edgeEmb: the proposed model without comprehensive or sememes knowledge embedding in Tree-LSTMs.

Experimental Setting
The version "bert-base-chinese" was employed for the BERT module in the framework. The dimensionality of word embeddings in Tree-LSTMs was set 200. We carefully tuned and specified the values of the dimensionalities of Tree-LSTMs hidden state and feature representation of rhetoric identification as 64. We employ a heuristic threshold method to specify and predict rhetorical and emotional labels for each sentence, where different threshold scores were evaluated by the grid search method and selected as 0.88 and 0.73 on validation sets, respectively. The joint one-versus-all cross-entropy loss was used to train the proposed multi-task learning. We applied the Adamax optimizer (Kingma and Ba, 2014) with learning rate as 0.00005, the batch size was set as 6, and the training epoch was set as 15. In addition, we applied dropout (Srivastava et al., 2014) on per layer of BERT model and prediction layers, and the value was 0.1.

Experimental Results
We compare the proposed method with state-of-theart multi-task learning methods. Table 3 shows the results of rhetoric and emotion identification.
In particular, the proposed REI-MUL model achieves the best performance in terms of F1 for both tasks. Generally, the disadvantage of CNN-Adversarial-MUL may lie in the fact that the sentence representation by CNN module is not as good as the semantic representation of the BERT module of the proposed model. Compared to BERT-MUL, the proposed REI-MUL attains better results. This may suggest that the learning from associations   between rhetoric and emotion categories is really beneficial for rhetoric and emotion tasks. In addition, the improved results of rhetoric detection show the independent rhetorical classifiers for each rhetoric detection are effective.

Ablation Study
We designed the ablation study to evaluate the impact of different parts of the proposed REI-MUL model for rhetoric and emotion identification. Specifically, we removed each of the following modules from our proposed model, i.e., correlation between two tasks, independent rhetorical classifier based on gating mechanism, word embedding of Tree-LSTMs. Table 4 shows the comparison results of REI-MUL and reduced models on annotated dataset. Clearly, these results manifest remarkable capability of each part of the proposed model. Table 5 shows the detailed identification results for each rhetorical category. Experimental results show that the proposed method performs well for minority categories, such as parallelism and rhetorical questions. In addition, the results for personification and irony are not as good as the rest, which may lie in their higher dependence on human cognition and pose a challenge for the identification.

Conclusion
In this paper, we have proposed a new multi-task and multi-label learning model, which can exploit the task correlation to jointly address the rhetoric and emotion identification in a unified framework. The experimental results show the benefit of the proposed model over the state-of-the-art baselines.