Quotation Recommendation and Interpretation Based on Transformation from Queries to Quotations

To help individuals express themselves better, quotation recommendation is receiving growing attention. Nevertheless, most prior efforts focus on modeling quotations and queries separately and ignore the relationship between the quotations and the queries. In this work, we introduce a transformation matrix that directly maps the query representations to quotation representations. To better learn the mapping relationship, we employ a mapping loss that minimizes the distance of two semantic spaces (one for quotation and another for mapped-query). Furthermore, we explore using the words in history queries to interpret the figurative language of quotations, where quotation-aware attention is applied on top of history queries to highlight the indicator words. Experiments on two datasets in English and Chinese show that our model outperforms previous state-of-the-art models.


Introduction
Quotations are essential for successful persuasion and explanation in interpersonal communication. However, it is a daunting task for many individuals to write down a suitable quotation in a short time. This results in a pressing need to develop a quotation recommendation tool to meet such a demand.
To that end, extensive efforts have been made to quotation recommendation, which aims to recommend an ongoing conversation with a quotation whose sense continues with the existing context (Wang et al., 2020). As quotations are concise phrases or sentences to spread wisdom, which are always in figurative language and difficult to understand, they are assumed written in a different pseudo-language (Liu et al., 2019a). Intuitively, we The code is available at https://github.com/ Lingzhi-WANG/Quotation-Recommendation [t1]: Save your money. Scuf is the biggest ripoff in gaming.
[t5]: The dumb ones are the people spending over $100 for a controller. [A fool and his money are soon parted.] [h1]: Anyone that ::::: spends that :::: much ::::: money just to get different writing on a box..... [A fool ... parted.] [h2]: And that's probably why you'll :::: never have a ::::: billion can infer the meanings of quotations by their neighborhood contexts, especially by the query turn (the last turn of conversation that needs recommendation).
To illustrate our motivation, Figure 1 shows a Reddit conversation with some history queries associated with quotation Q, "A fool and his money are soon parted". From the queries (t 5 and h 1 to h 3 ), we can infer the meaning of quotation Q is "A foolish person spends money carelessly and won't have a lot of money." based on the contexts. From h 3 , we can also know the implication behind the words, which is "Do a marketing research before buying". Humans can establish such a relationship between quotations and queries and then decide what to quote in their writings, so can machines (neural network). Therefore, we introduce a transformation matrix, in which machines can learn the direct mapping from queries to quotations. The matrix is worked on the outputs of two encoders, conversation encoder and quotation encoder, encoding conversation context and quotations respectively. Furthermore, we can use the words in the queries to interpret quotations. h 1 to h 3 in Figure 1 are denoted as history queries, and the words on wavyunderline are denoted as indicators to quotations. It can be seen that we can interpret quotations by highlighting the words in the queries. Therefore, we compute quotation-aware attention over all the history queries (after the same transformation as we mentioned before) and then display indicators we learned, which also reflects the effectiveness of the transformation.
In summary, we introduce a transformation between the query semantic space and quotation semantic space. To minimize the distance of their semantic space after transformation mapping, an auxiliary mapping loss is employed. Besides, we propose a way to interpret quotations with indicative words in the corresponding queries.
The remainder of this paper is organized as follows. The related work is surveyed in Section 2. Section 3 presents the proposed approach. And Section 4 and 5 present the experimental setup and results respectively. Finally, conclusions are drawn in Section 6.

Related Work
Quotation Recommendation. In previous works on quotation recommendation, some efforts are made for online conversations (Wang et al., 2020;Lee et al., 2016) and some for normal writing (Liu et al., 2019a;Tan et al., 2015Tan et al., , 2016. Our work focuses on the former. For methodology, the methods they applied can be divided into generation-based framework (Wang et al., 2020;Liu et al., 2019a) and ranking framework (Lee et al., 2016;Tan et al., 2015Tan et al., , 2016. Different from previous works which mainly focus on separate modeling of quotation and query and pay little attention to the relationship between them, our model directly learns the relationship between quotations and query turns based on a mapping mechanism. The relationship mapping is jointly trained with the quotation recommendation task, which improves the performance of our model.

Our model
This section describes our quotation recommendation model, whose overall structure is shown in Figure 2. The input of the model mainly contains the observed conversation c and the quotation list q. The conversation c is formalized as a sequence of turns (e.g., posts or comments) {t 1 , t 2 , ..., t nc } where n c represents the length of the conversation (number of turns) and t nc is the query turn. t i represents the i-th turn of the conversation and contains words w i . The quotation list q is {q 1 , q 2 , ..., q nq }, where n q is the number of quotations and q k is the k-th quotation in list q, containing words w k . Our model will output a label y ∈ {1, 2, ..., n q }, to indicate which quotation to recommend.

Conversation Modeling
Our model encodes the observed conversation c with a hierarchical structure, which is divided into three parts. The first part is an embedding layer mapping the words w i in each turn t i into vectors. We then apply transformer (Vaswani et al., 2017) to learn the representation for each turn. Similar to BERT (Devlin et al., 2018), we only use the encoder of transformer, which is stacked of several self-attention and feed-forward layers. We add a token [CLS] at the beginning of each turn. The hidden representation of [CLS] after transformer encoder is defined as the turn representation r t i of turn t i . The procedures for the first two parts are summarized as follows: (1) where w 0 represents the [CLS] token, and [; ] indicates concatenation. Therefore r t i = h T i,0 . Next, we use a Bi-GRU (Cho et al., 2014) layer to model the whole conversation structure. With the turn representations {r t 1 , r t 2 , ..., r t nc } (r t nc is the representation for the query turn) of conversation c derived from previous procedure, the hidden states are updated as follows: Finally, we define the conversation representation as the concatenation of the final hidden states from two directions:

Quotation Modeling
For each quotation q k in list q, we extract quotation representation r q k with similar operation as turn representations (see Eq. 1). As Liu et al. (2019b) points out, the language used in quotations is usually different from our daily conversations, which results in two different semantic spaces. Therefore, we do not share the parameters of the embedding layer and transformer layers for quotations and conversation turns. We concatenate all the quotation representations and get a combined quotation matrix Q, which includes n q rows and each row represents one quotation.

Recommendation Based on Transformation
To perform a reasonable recommendation, we consider the observed conversation c, the query turn t nc as well as the quotation list q. Since they are in different semantic spaces (Section 3.2), we first map the query turns into the space of quotations with a transformation matrix M . We assume with such transformation, the space gap can be resolved. Thus, we can calculate the distance between queries and quotations. We use z c to represents the distances between r nc and the quotations, and it is defined with the following equation: Finally, the output layer is defined as: where W and b are learnable parameters. We recommend the quotations with the top n highest probabilities, which are derived with a softmax function:

Training Procedure
We define our training objective as two parts. The first part is called recommendation loss, which is the cross entropy over the whole training corpus C: where q c is the ground-truth quotation for conversation c in training corpus. The second part is to help on the learning of transformation matrix M , where we minimize the distance between the transformed query turn representation and the corresponding ground-truth quotation: To train our model, the final objective is to minimize L, the combination of the two losses: where λ are the coefficient determining the contribution of the latter loss.

Experimental Setup
Datasets. We conduct experiments based on datasets from two different platforms, Weibo and Reddit, released by Wang et al. (2020). To make our experimental results comparable to Wang et al. (2020), we utilize their preprocessed data directly.
Parameter Setting. We first initialize the embedding layer with 200-dimensional Glove embedding (Pennington et al., 2014) for Reddit and Chinese words embedding (Song et al., 2018) for Weibo. For transformer layers, we choose the number of layers and heads as (2, 3) for Reddit and (4, 4) for Weibo. For the hidden dimension of transformer layers and BiGRU layers (each direction), we set it to 200. We employ Adam optimizer (Kingma and Ba, 2015) with initial learning rate with 1e-4 and early stop adoption (Caruana et al., 2001) in training. The batch size is set to 32. Dropout strategy (Srivastava et al., 2014) and L 2 regularization are used to alleviate overfitting. And the tradeoff parameter λ is chosen from {1e-4, 1e-3}. All the hyper-parameters above are tuned on the validation set by grid search.
Evaluation and Comparisons. Our model returns a quotation list arranged in descending order of likelihood of recommendation for each conversation. Therefore, we adopt MAP (Mean Average Precision), P@1 (Precison@1), P@3 (Preci-son@3), and nDCG@5 (normalized Discounted Cumulative Gain@5) for evaluation. For comparison, we compare with previous works that focus on quotation recommendation. Below shows the details: 1) LTR (Learning to Rank). We first collect features (e.g., frequency, Word2Vec, etc.) mentioned in Tan et al. (2015) and then use the learning to rank tool RankLib 1 to do the recommendation. 2) CNN-LSTM. We implement the model proposed in Lee et al. (2016), which adopts CNN to learn the semantic representation of each turn and then uses LSTM to encode the conversation. 3) NCIR. It formulates quotation recommendation as a context-to-quote machine translation problem by using the encoder-decoder framework with attention mechanism (Liu et al., 2019b). 4) CTIQ. The SOTA model (Wang et al., 2020), which employs an encoder-decoder framework enhanced by Neural Topic Model to continue the context with a quotation via language generation. 5) BERT. We encode the conversation by BiLSTM on the BERT representations for the turns, followed by a prediction layer. Table 1 displays the recommendation results comparing our model with the baselines on Weibo and Reddit datasets. Our model achieves the best performance, exceeding the baselines by a large margin, especially on Reddit dataset. The fact that better performance comes from BERT and our model indicates the importance of learning efficient content representations. Our model further considers the mapping between different semantic spaces, resulting in the best performance.

Quotation Recommendation
Ablation Study. We conduct an ablation study to examine the contributions of different modules in our model. We replace the transformer layers with Bi-GRU (W/O Transformer) to examine the effects of different turn encoders. We also compare the models by removing transformation matrix M (W/O M ) or mapping loss L map (W/O L map ). The results are shown in Table 2. As can be seen, each module in our model plays a role in improving per-   formance. The largest improvement comes from applying transformers as our encoders. The performance drop due to removing transformation and mapping loss justifies our assumption of different semantic spaces between quotations and queries.

Quotation Interpretation
We also explore how to interpret the figurative language of quotations with our model. We first extract the queries that are related to one certain quotation as history queries, then compute quotation-aware attention over all history queries. Specifically, for quotation q k , with its relative history queries {h 1 , h 2 , ..., h m k } from the corpus (m k is the history number), we can compute their quotation-aware attention (query-level) with their representations derived from our model : On the other hand, we can extract the scores for the words in each history query with their selfattention weights (word-level) in transformer. Finally, the indicative words of one quotation are those with the highest scores after the multiplication of query-level and word-level attention scores. Figure 3 shows an interpretation example. We display three example queries mentioned in Figure  1, with both their query-level attention (green) and word-level attention (red). We can find that words like "spends", "money" and "dollars" are assigned higher scores since they are more related to the quotation topics . We also present the most indicative words derived from all history queries (the lower part of Figure 3). We can easily infer the meaning of the quotation with the help of indicative words like "idiots" and "buy".

Conclusion
In this paper, we propose a transformation from queries to quotations to enhance a quotation recommendation model for conversations. Experiments on Weibo and Reddit datasets show the effectiveness of our model with transformation. We further explore using indicative words in history queries to interpret quotations, which shows rationality of our method.