Enhancing Metaphor Detection by Gloss-based Interpretations

This paper focuses on utilizing metaphor interpretation to enhance metaphor detection . Considering that existing approaches to metaphor interpretation are limited by ambiguous meanings of the metaphorical substitute words, this paper proposes a novel interpretation mechanism that utilizes glosses to interpret metaphorical words. Since there is no dataset annotated for both metaphor detection and metaphor interpretation, we enhance three datasets TroFi, VUA, and PSUCMC from the ﬁeld of metaphor detection with gloss annotations. Accordingly, we develop a model for jointly conducting metaphor detection and gloss-based interpretation (named MDGI-Joint for short). Experimental results demonstrate that MDGI-Joint outperforms state-of-the-art models on all the three enhanced datasets and that gloss-based metaphor interpretation beneﬁts metaphor detection. 1


Introduction
Metaphor has been defined as words or other linguistic expressions representing another concept with the language from a more concrete conceptual domain (Kövecses and Zoltán, 2002;Lagerwerf and Meijers, 2008). According to existing studies on metaphor (Kövecses and Zoltán, 2002;Steen, 2010), metaphor has been used so frequently in daily language that it almost occurs in one out of three natural language sentences.
Metaphor detection aims to identify all metaphorical words in given texts and has been demonstrated to be of great value in many Natural Language Processing (NLP) tasks, such as machine translation (Mao et al., 2018), machine reading comprehension (Shutova et al., 2013), etc. There-Figure 1: An example for metaphor detection, where the word clouded is a metaphorical word (highlighted in bold and underline). Interpretation by a substitute word (highlighted in bold) is computed by (Mao et al., 2018). Gloss as interpretation (highlighted in blue bold italics) is picked from the Merriam Webster dictionary. fore, metaphor detection has drawn increasing interests in recent years. There have emerged a number of methods for metaphor detection, including SEQ (Gao et al., 2018), RNN HG (Mao et al., 2019), RNN MHCA (Mao et al., 2019), MUL GCN (Le et al., 2020), DeepMet (Su et al., 2020), etc.
To capture the meanings of metaphorical words, metaphor interpretation has also been studied, which paraphrases metaphorical expressions into literal expressions that maintain the intended meanings of given texts (Mao et al., 2018). Existing approaches have treated metaphor interpretation as extraction of transferred properties, identification of the underlying conceptual mapping, or generation of a literal substitute paraphrase (Rai and Chakraverty, 2020). All of them are yet limited by ambiguous meanings of the metaphorical substitute words. Consider the example shown in Figure 1. The metaphorical word clouded is interpreted as change in (Mao et al., 2018), where change has 10 diverse meanings according to WordNet (Miller, 1995). Due to multiple meanings of change, it is difficult to precisely capture the intended meaning of the metaphorical word clouded.
It has been pointed out by Group (2007) that, metaphors have a clear distinction between their basic meanings and contextual meanings. Consider the example shown in Figure 1 again. The glossbased interpretation for clouded gives the meaning "to make unclear or confused", which is evidently different from the basic meaning "to grow cloudy" of clouded. In other words, different from the substitute word, the gloss extracted from dictionaries can provide an unambiguous interpretation of clouded which exhibits a clear distinction to its basic meaning. Therefore, we can utilize gloss-based metaphor interpretation to enhance the metaphor detection task.
Based on the above observations, we propose to use glosses to interpret metaphorical words. Specifically, we formulate metaphor interpretation as a task of predicting the best gloss among a set of candidate glosses. Considering that Word Sense Disambiguation (WSD) (Kilgarriff, 2004) is a popular technique for identifying the correct meaning of a target word, modern WSD methods can be adapted to metaphor interpretation.
By now there is a lack of dataset annotated for both metaphor detection and metaphor interpretation. In order to study whether gloss-based interpretation benefits metaphor detection, we enhance three benchmark datasets in the field of metaphor detection, including two English datasets (TroFi and VUA) and one Chinese dataset (PSUCMC). For each dataset, we construct a set of candidate words and annotate these words with glosses extracted from a dictionary. Accordingly we develop a joint model for Metaphor Detection and Gloss-based Interpretation (named MDGI-Joint for short). To be specific, our joint model encodes contexts and glosses independently for every given word. Based on both the contextual word embedding and gloss embeddings, a probability distribution for all glosses of the given word is computed and then used to predict the best gloss, in a similar way as the state-of-the-art WSD method (Blevins and Zettlemoyer, 2020), Afterwards, an attention mechanism is employed to compute an integrated representation of all glosses. This integrated representation is then concatenated with the contextual word embedding to determine whether the given word is metaphorical through a classical prediction layer. The joint model is trained by minimizing a combined loss from both the metaphor detection task and the metaphor interpretation task.
We conduct experiments on the aforementioned three enhanced datasets. Experimental results demonstrate that gloss-based metaphor interpretation does benefit metaphor detection. On one hand, the proposed model achieves state-of-theart performance on all three enhanced datasets in the metaphor detection task. On the other hand, it also achieves comparable performance with the outstanding WSD method in the metaphor interpretation task.
The main contributions of this paper can be summarized as follows.
• We provide a novel interpretation mechanism that utilizes glosses to interpret metaphorical words.
• We enhance three metaphor detection datasets (TroFi, VUA, and PSUCMC) with annotations of glosses for metaphorical words.
• We develop a joint model for metaphor detection and gloss-based interpretation and empirically show that metaphor interpretation with glosses benefits metaphor detection.
2 Related Work

Metaphor Detection
Early studies on metaphor detection (Shutova et al., 2010;Rei et al., 2017) usually follow the linguistics theory (Lakoff and Johnson, 1980) and construct mappings from the source domain to the target domain. Subsequent studies focus on metaphor detection over subject-verb-objects, adjective-noun tuples, or metaphorical phrases. Among these studies, a lot of semantic features including the degree of abstractness, the degree of concreteness, the degree of imageability, semantic super-senses (namely coarse semantic categories originating in WordNet), lemma unigrams, and grammatical dependencies are added to improve performance of metaphor detection (Turney et al., 2011;Tsvetkov et al., 2014;Klebanov et al., 2016;Özbal et al., 2016;Jang et al., 2015). Jang et al. (2015) took topic distribution into consideration. Jang et al. (2016) explored topic transition between a metaphor and its context. For handing multi-modal information,  considered both word embeddings and visual embeddings whereas  introduced a cross-modal method to integrate linguistic representations and property-based representations. A broader context of discourse was considered by (Jang et al., 2015) and (Mu et al., 2019).
Regarding word-level metaphor detection, metaphor detection can be treated as a sequence tagging task. Wu et al. (2018) proposed a neural model, which uses Word2Vec (Mikolov et al., 2013) as text representation and encodes partof-speech (POS) tags, word clusters with Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM) network. Gao et al. (2018) and Mao et al. (2019) respectively utilized GloVe (Pennington et al., 2014) and ELMo (Peters et al., 2018) embeddings. Le et al. (2020) proposed to construct a graph CNN guided by dependency trees of sentences for metaphor detection and to construct a multi-task learning framework for the WSD task, utilizing the knowledge from WSD to improve metaphor detection. Instead of treating metaphor detection as a sequence tagging task, Su et al. (2020) proposed a novel reading comprehension paradigm based on a pre-trained language model, using features from POS tags and local texts.

Metaphor Interpretation
Metaphor interpretation is an intricate task, having a challenge in deciphering the meaning conveyed by a metaphorical expression (Rai and Chakraverty, 2020). Existing approaches to metaphor interpretation can be grouped into three categories. For the first category, the problem of metaphor interpretation is treated as a problem of extraction of transferred properties. It is often assumed that a metaphor is essentially a projection of a specific set of salient concept properties from the source domain, known as property matching (Su et al., 2016;Ortony, 1980). Su et al. (2016) extracted perceptual properties from the source domain and the target domain, and then searched for metaphor interpretation by expanding the extracted properties with synonymy relationships from WordNet. In contrast to property matching, the second category defines the problem of metaphor interpretation as a problem of identifying the underlying conceptual mapping, usually focusing on re-conceptualization of the target domain (Marmolejo-Ramos et al., 2013;Semino, 2010). Martin (2006) empirically found that metaphor interpretation has certain contextual clues (e.g., the appearance of a concept from the target domain) and related metaphorical expressions. For the last category, the problem of metaphor in-terpretation is treated as the generation of a literal substitute paraphrase (Mao et al., 2018;Shutova et al., 2012). Mao et al. (2018) used hypernyms and synonyms as candidate substitutes and computed the best substitute word by the cosine similarity between the embedding of the given word and the embedding of a candidate substitute.
All the above methods for metaphor interpretation fail to capture the contextual meaning of a metaphorical word due to ambiguous interpretation of the metaphor. To tackle this issue, in this work we provide a novel interpretation mechanism that utilizes glosses to interpret metaphorical words.

Word Sense Disambiguation
Word sense disambiguation (WSD) aims to predict a specific meaning of a word that occurs in a particular context (Navigli, 2009). Understanding the meaning of a word in context is critical to many NLP tasks, such as machine translation (Vickrey et al., 2005;Neale et al., 2016;Gonzales et al., 2017) and information extraction (Ciaramita and Altun, 2006;Bovi et al., 2015). One category of WSD is class-based, which provides coarse-grained labels that are shared among different words. The other category of WSD is word-based, aiming to disambiguate every word in texts (Palmer et al., 2001;Moro and Navigli, 2015;Blevins and Zettlemoyer, 2020). Our proposed metaphor interpretation scheme belongs to this category. Some neural models for word-based WSD exploit encoders for better feature extraction. Based on an encoder, they either train classifiers on top of extracted features (Kågebäck and Salomonsson, 2016) or introduce a shared output space to label words (Raganato et al., 2017). Other neural models augment word representations with additional data by semi-supervised learning (Melamud et al., 2016;Yuan et al., 2016). BEM (Blevins and Zettlemoyer, 2020), which inspires our model, is a state-of-theart method for word-based WSD. It introduces a bi-encoder model to embed the target word with its surrounding context and its glosses.

The Proposed MDGI-Joint Model
Given a sentence s consisting of n words {w 0 , w 1 , . . . , w i , . . . , w n 1 } and a list of all m i candidate glosses G i = {g 0 i , g 1 i , · · · , g j i , . . . , g m i 1 i } for the target word w i , the task of metaphor interpretation aims to select a gloss from G i to interpret the intended meaning of w i , whereas the task of metaphor detection focuses on predicting whether the word w i is metaphorical or literal. Since glossbased metaphor interpretation provides a rich representation for the contextual meaning of the target word, it is appropriate to construct a joint model to incorporate gloss-based metaphor interpretation and metaphor detection. Therefore, we propose a model named MDGI-Joint to incorporate metaphor detection and metaphor interpretation.
The architecture of MDGI-Joint is given by Figure 2. MDGI-Joint employs two encoders to generate respectively the contextual representation and the gloss representation for a target word. The probability distribution over all candidate glosses is computed by an attention mechanism. The gloss with the highest probability is predicted as the interpretation of the target word. Afterwards, an integrated representation of all candidate glosses is computed and then concatenated with the contextual representation. The concatenated representation is finally used to determine whether the target word is metaphorical, through a fully-connected layer followed by the softmax classifier.

Encoding Module for Sentences
Given a sentence s consisting of n words {w 0 , w 1 , . . . , w i , . . . , w n 1 } as well as a target word w i , the contextual representation of w i will be encoded into a vector. Considering that the pre-trained language model BERT (Devlin et al., 2019) has been proved to be effective in transfer learning, contributing to state-of-the-art performance in many NLP tasks, we fine-tune a BERT model to be the context encoder. Initially, we construct a token sequence "[CLS], w 0 , w 1 , · · · , w n 1 ,[SEP]", where [CLS] and [SEP] are special tokens introduced in BERT. Then this token sequence is taken as input to BERT, yielding a sequence of 768-dimensional vectors "h [CLS] , h 0 , h 1 , . . . , h n 1 , h [SEP ] " as the output of BERT. The corresponding vector h i for the target word w i is treated as the contextual representation of w i .

Encoding Module for Glosses
For the target word w i , we collect the set of candidate glosses G i = {g 0 i , g 1 i , ..., g m i 1 i } from an existing dictionary. We fine-tune another BERT model to be the gloss encoder. Similar to the encoding module for sentences, we also construct a token sequence for each gloss and feed it into BERT. For each gloss g j i 2 G i where 0  j  m i 1, we use the output 768-dimensional vector corresponding to the first token "[CLS]" as the gloss representation, which is written as p j i .

Prediction for Metaphor Interpretation
Based on the contextual representation h i of w i and the gloss representation p j i of each gloss g j i 2 G i where 0  j  m i 1, the probability ↵ j i for a gloss g j i to represent the intended meaning of the word w i is computed by an attention mechanism, formally defined as follow:

Prediction for Metaphor Detection
We define the joint representation q i of the target word w i as the concatenation of the contextual representation h i and the weighted sum p ⇤ i of gloss representations p 0 i , . . . , p m i 1 i , which is formally defined below: where [; ] denotes the concatenation of two vectors. By transforming the joint representation q i through a fully connected layer followed by the softmax classifier, the probability distribution l i that the target word w i is metaphorical or literal is formally defined below: where W 1 2 R 2⇥1536 and b 1 2 R 2 are learnable parameters. The probability distribution l i is of the form [l 0 i , l 1 i ], where l 0 i is the predicted probability that w i is literal, and l 1 i is the predicted probability that w i is metaphorical.

Training Objective Function
Based on predicted probabilities ↵ 0 i , . . . , ↵ m i 1 i over glosses, the loss value for metaphor interpretation about the target word w i is defined as: where f i is the correct gloss for w i in sentence s, I(X) = 1 if X is true and I(X) = 0 otherwise. As for the task of metaphor detection, we employ the binary cross entropy loss for predicting the target word w i to be literal or metaphorical, defined as follow: The proposed joint model is trained by minimizing the following combined loss for every training sentence.
where n is the number of words in the considering sentence, and C is a predefined set of candidate words required to be interpreted, which is collected from all metaphorical words in our experiments.  • TroFi (Birke and Sarkar, 2006). This is a benchmark metaphor dataset with verb metaphors annotated. Following the work (Mao et al., 2019), we treat unlabeled words as literal in the training phase.
• VUA (Steen, 2010   consists of text samples from the Lancaster Corpus of Mandarin Chinese where are annotated for metaphor-related words following MIPVU (Steen, 2010). All words in PSUCMC are labeled. Following the experimental setting of VUA, we evaluate on both the PSUCMC ALL POS track and the PSUCMC-Verb track.
All these datasets are enhanced to support the gloss-based metaphor interpretation task. We refer to the set of words that need to be interpreted as the candidate set. For TroFi, the candidate set includes all labeled verbs. For VUA, the words in the candidate set are randomly selected from verbs. As for PSUCMC, we randomly choose a set of words and filter the meaningless words to construct a candidate set. Words in the candidate set are annotated by human annotators.
For English datasets, when given a word in the candidate set, the annotators are asked to look up the Merriam-Webster dictionary 2 to fetch its glosses. For the Chinese dataset, word glosses are extracted from the Baidu Dictionary 3 .
For every dataset, four annotators are recruited to annotate the dataset independently. All annotators need to select the most appropriate gloss that expresses the contextual meaning of the target word, by comparing all glosses with the given context of the target word. They also need to discuss and determine the final labels after generating their own annotations.
To verify the reliability of the annotations, we use kappa-score to measure inter-annotator agreements for the annotations. Kappa-score (Siegel, 1956) has been widely used in computational linguistics to measure the reliability of an annotation scheme. Table 1 shows the kappa-score for annotations in every dataset, which demonstrates that the annotations have a high degree of reliability.
For PSUCMC and TroFi, we randomly split the samples into a training set, a validation set, and a 2 https://www.merriam-webster.com/ dictionary/ 3 https://dict.baidu.com test set according to the proportion of 8 : 1 : 1. For VUA, the data splits provided by (Mao et al., 2019) are reused. Table 2 reports the statistics of  all experimental datasets, whereas Table 3 gives more details about PSUCMC.

Experimental Setup
In our experiments, BERT 4 is used as both the context encoder and the gloss encoder, where the uncased BERT base model is used for TroFi and VUA, and the Chinese BERT base model is used for PSUCMC.
There are two variants for our proposed model. The first variant, named MDGI-Joint-S, shares parameters between the context encoder and the gloss encoder. The second one, named MDGI-Joint, implements two independent encoders for context and gloss, respectively.
To train these two models, we set the learning rate as 2e-5. The maximum number of epochs is set to 20. We set the dropout probability to 0.2 for the fully connected layer. The max sequence length is set to 128 for TroFi and VUA, and 256 for PSUCMC. The batch sizes for the two tasks are all set to 8 for VUA and TroFi, and 16 for PSUCMC. We keep the best model that maximizes the F1 score on the validation set for the metaphor detection task. This model is then used to evaluate the test set.

Compared Methods
We compare our method with the following methods.
• SEQ (Gao et al., 2018). SEQ is a neural model taking ELMo embeddings and GloVe embeddings as input. It uses a Bi-LSTM encoder to capture the contextual information of the target word, and then employs a fully connected layer followed by the softmax classifier to predict whether a word is metaphorical or not.
• RNN HG (Mao et al., 2019). RNN HG also takes ELMo embeddings and GloVe embeddings as input. Different from SEQ (Gao et al., 2018), RNN HG concatenates the encoded representation with the GloVe embedding to capture the contextual information of the target word. • RNN MHCA (Mao et al., 2019). RNN MHCA has a similar architecture as RNN HG. But it adopts a multi-head contextual attention mechanism to capture the contextual information of the target word.
• MUL GCN (Le et al., 2020). MUL GCN is a multi-task learning framework. It exploits the similarity between word sense disambiguation and metaphor detection, by employing a graph convolutional neural network (GCN) to connect the words of interest with context words for metaphor detection.
• DeepMet (Su et al., 2020). DeepMet is a RoBERTa (Ott et al., 2019) based model with an ensemble strategy. It also takes features including POS tags and local texts as input. For fairness, we only compare our models with the single model of DeepMet.
• BEM (Blevins and Zettlemoyer, 2020). BEM is a model for the WSD task. It consists of two independent encoders. One is the context encoder, which embeds the target word and its surrounding context. The other encoder is the gloss encoder, which embeds the gloss for each word sense. Both encoders are deep transformer networks initialized from BERT.
For the evaluation on PSUCMC and TroFi, we use the publicly released code for all the compared methods except for MUL GCN whose code is unavailable. Thus we cannot obtain results for MUL GCN on both PSUCMC and TroFi. For VUA, we present results that are reported in the published papers of the compared methods.
It should be noted that all compared methods target the English domain only in their published papers. Since PSUCMC is a dataset in the Chinese domain, to evaluate the compared methods on PSUCMC, we use ELMo embeddings trained on the Xinhua proportion of Chinese gi-gawords 5 , which is a Chinese corpus released by Che et al. (2018) and Fares et al. (2017); moreover, we place the GloVe embeddings with word embeddings trained on the zh-wiki corpus 6 based on Word2Vec (Mikolov et al., 2013).

Experimental Results
We use P (precision), R (recall), F1 (F1 score) and Acc (accuracy) as the evaluation metrics for the metaphor detection task, and Acc (accuracy) for the metaphor interpretation task. All the reported numbers are in percent. For the metaphor detection task, all words in the test set are evaluated. For the metaphor interpretation task, although corresponding glosses can be computed for all words, only the words with annotations in the test set are evaluated.

Results for Metaphor Interpretation
It can be seen from Table 4 that, no matter whether the two encoders share parameters or not, the proposed model achieves similar results comparable with BEM. MDGI-Joint-S is even slightly superior to BEM. The implemented BEM model does not share parameters between the two encoders. Therefore, it can be confirmed that the improved performance of MDGI-Joint-S comes from capturing the interaction between the two tasks, rather than from inheriting parameters from BEM.

Results for Metaphor Detection
From Table 5, it is evident that MDGI-Joint outperforms other methods over all three datasets except for the VUA-Verb track. Although DeepMet achieves the best performance on the VUA-Verb track, it exhibits lower performance than our proposed model on other datasets and the VUA ALL POS track. There results indicate that joint training for metaphor detection and metaphor interpretation does improve the performance of metaphor 5 https://github.com/HIT-SCIR/ ELMoForManyLangs   detection. It can also be seen that all the compared methods perform poorly on the Chinese dataset PSUCMC. This is probably due to that these methods are strongly language sensitive. The reason why DeepMet outperforms our model on the VUA-Verb track is due to extra information such as subclasses of POS tags being taken as input in Deep-Met. It is interesting to see whether our proposed model can be further improved by exploiting extra information. But this question is out of the scope of this work. It will be explored in our future work.

Case Study
As shown in Table 6, we use two cases to exemplify the superiority of our model in the metaphor detection task, demonstrating that gloss-based interpretation improves performance of metaphor detection. The first example is picked from the TroFi dataset (see Row 2 in Table 6). In this example the detected word rained is a metaphorical word, but DeepMet predicts it as literal. The reason could be that the word rained commonly occurs in the given context. In contrast, MDGI-Joint correctly predicts it as metaphorical according to its glossbased interpretation to fall like rain since a gloss of the form do like some behavior is commonly used as the gloss of a metaphorical word.
The second example is selected from the VUA dataset (see Row 3 in Table 6). In this example the detected word stand has a literal meaning in its context. DeepMet wrongly predicts it as metaphorical, while MDGI-Joint gives a correct prediction based on its predicted gloss-based interpretation which has an expression style as that of general glosses of literal words.

Ablation Study
To investigate the effect of joint training for the two tasks, we further conduct experiments on the following weakened models.
• MD-MGI. In this model we only use the context representation to predict whether a target word is metaphorical. In other words, the weighted sum of gloss representations is not considered in the metaphor detection task, although the two tasks are still jointly trained without sharing parameters between the context encoder and the gloss encoder.
• MD (only). This model addresses the metaphor detection task only and neglects the metaphor interpretation task. It detects metaphors based on the contextual representation of words only.
From Table 7 we observe that the performance of MD-MGI drops slightly on the metaphor detection task compared to the proposed model. The reason is probable that, in MD-MGI the two tasks only interact through the context encoder, leading to limited benefits for metaphor detection. From Table 8 we can see that all the three compared variants are comparable in the metaphor interpretation task, where MDGI-Joint-S is slightly better than others. These results show that whether the metaphor detection task uses the weighted sum of gloss representations or not has few impacts on the metaphor interpretation task. It can be seen

Error Analysis on VUA
Taking the VUA dataset as example, there are more false negatives than false positives generated by MDGI-Joint; i.e., the recall is lower than the precision, as shown in Table 9. Especially, adjectives, adverbs and nouns have a significantly lower recall than verbs. The reason for this phenomenon is two-fold. On one hand, only a partial set of verbs has annotations in the VUA dataset, so the detection of metaphorical adjectives, adverbs or nouns can only gain limited benefits from the metaphor interpretation task. On the other hand, in VUA a number of metaphors appear at the phrase level, but MDGI-Joint is only able to detect metaphors at the word level, thus it is more likely to predict false negatives. Consider the following example picked from VUA id:a7w-fragment01#41: But only two million out of the 20 million journeys which ambulance crews carry out each year are emergency calls. MDGI-Joint only detects the word carry as metaphorical. It treats words separately and cannot predict the preposition out in the phrase carry out as metaphorical. It also wrongly predicts crews as literal possibly due to that the word is a noun.

Discussion
Out-of-vocabulary words are treated differently in the training phase and in the test phase. In the training phase, the loss of metaphor interpretation over out-of-vocabulary words is not computed according to Equation (7). In the test phase, the evaluation on metaphor detection involves all words, but the evaluation on metaphor interpre-tation only targets in-vocabulary words. For the metaphor detection task, in-vocabulary words and out-of-vocabulary words have no different treatments. We have conducted extra experiments for metaphor detection on the VUA ALL POS track. The results reported in Table 9 show that MDGI-Joint achieves significantly higher performance on in-vocabulary verbs (77.3%) than on out-of-vocabulary verbs (73.9%), but on all outof-vocabulary words, MDGI-Joint achieves similar performance (77.2%). Due to the limited glosses extracted from existing dictionaries, there will be correct glosses for novel metaphors that do not appear in the training set. It has been shown (Rai and Chakraverty, 2020) that interpreting novel metaphors in general situations is difficult. Hence our proposed glossbased interpretations are definitely more suitable for conventional or lexical metaphors. Nevertheless, by considering that we can expand glosses with more external resources, the effectiveness of our proposed approach is not limited to a fixed set of metaphors.

Conclusion and Future Work
Metaphor detection is of great value in many natural language processing tasks. In this paper we have utilized gloss-based metaphor interpretation to enhance metaphor detection. The novelty mainly lies in the interpretation mechanism, i.e., utilizing glosses to interpret metaphorical words. Accordingly, we propose a joint model for metaphor detection and gloss-based interpretation. We enhance three datasets in the field of metaphor detection to evaluate the joint model. Experimental results confirm that metaphor interpretation in gloss improves the performance of metaphor detection. Future work will extend our approach to other rhetoric identification tasks.