Stylized Dialogue Generation with Feature-Guided Knowledge Augmentation

,


Introduction
With the increasing demand for meaningful humancomputer interaction, the development of advanced dialogue systems has become crucial across various domains.Stylized dialogue generation focuses on producing human-like conversations by generating stylized and coherent responses.To achieve this, significant efforts have been dedicated to building response-matching models that leverage both dialogue content and style sentences, resulting in more engaging and diverse responses (Gao et al., 2019b;Zheng et al., 2021;Li et al., 2021;Sun et al., 2022).

Stylized Knowledge:
S 1 : Friends with shared musical interests often form strong bonds.S 2 : Dear Watson, have you considered the power of melodic harmonies?S 3 : It would behoove you to venture into social circles to reveal those who possess music acumen.
None of my friends discuss music with me.

Context:
You could try looking for people who like discussing music too.

Baseline:
Dear Watson, it behooves you to explore social circles for those with shared musical interests and musical acumen.

KASDG:
Figure 1: Example of stylized dialogue generation enhanced by knowledge from semantic and stylistic level.
However, stylized dialogue generation encounters a challenge due to the absence of parallel training data between context and desired stylized responses, which renders supervised methods ineffective.One natural solution to this problem is to enhance the connection between different hidden space vectors.For instance, Gao et al. (2019b) propose a method that bridges conversation modeling and non-parallel style transfer by sharing a structured latent space.Nevertheless, this approach is limited by the quality of the available data, and the weak supervision signal leads to unsatisfactory performance in practice (Zheng et al., 2021).Another direction is to establish pseudo pairs between contexts and style-specific responses using unsupervised or semi-supervised learning by back translation, which is proposed for translating a sentence from the target language back to the source language, allowing the model to learn from the differences and improve translation quality (Sennrich et al., 2016).Su et al. (2020) propose a diversifying dialogue model based on iterative back translation.Zheng et al. (2021) employ a style rout-ing approach with a joint training process, where a backward model with style embeddings is used to generate pseudo pairs and train the stylized model.Li et al. (2021) bridge the forward and backward models through a text transfer dataset to construct high-quality pseudo-contexts.Despite the progress made by these approaches, the pseudo data constructed using the back translation method exhibit low diversity and these methods suffer from noisy and context-agnostic style signals (Li et al., 2021).This is because the style knowledge is not explicitly provided for each response.Additionally, there is a lack of alignment between relevant stylistic words and responses, significantly reducing the incorporation of style features in the generated responses.
Motivated by knowledge-grounded dialogue generation (Liu et al., 2021), we introduce a knowledge-augmented stylized dialogue generation (KASDG) model, as illustrated in Figure 1.Our model utilizes style corpus from an external knowledge base perspective, and provides style information for response generation in an explicit way.Specifically, we employ a style sentence retrieval process, which retrieves the most relevant style sentences based on similarity metrics to guide the style of response.In order to filter out noisy information and provide dialogue-related style signals, we refine the retrieved style feature using a feature-guided selection module, wherein the retrieved style feature is filtered under the guidance of context.Moreover, to reduce the gap between context-related selection and response-related selection and obtain more useful filtered style features for response generation, we propose responserelated contrastive learning and style responsiveness Kullback-Leibler loss.The context hidden and filtered style features are integrated into the content-aware attention and style-aware attention in decoder, producing the final dialogue response.
We conduct extensive experiments on two benchmarks with four distinct styles to verify the effectiveness of our proposed model.The experimental results demonstrate that our approach generates more informative and accurate responses that align with the given stylized knowledge.We also discuss the advantages of our model compared with large language model in Appendix A.5.In summary, our contributions can be summarized as follows: • We propose a knowledge-augmented stylized dialogue generation model with a feature-guided selection in an unsupervised manner.
• We design a feature-guided selection module to filter style features and bridge the gap between prior and posterior in style sentence selection.
• Experiments conducted on two datasets with four styles show that our proposed method outperforms all baselines with limited training data.
2 Related Work  (Zheng et al., 2021) and MPDL (Li et al., 2021) use back translation techniques to obtain pseudo-contexts.However, these methods suffer from noisy and context-agnostic style signals, as the style is not explicitly provided for each response.Our approach explicitly exploits style information from an external knowledge perspective and can provide clearer style signals for better generation via knowledge retrieval and filtering.
Knowledge Selection focuses on identifying relevant knowledge sources from a large corpus to improve downstream tasks, which can be divided into two categories: sequential selection and nonsequential selection.The former aims to select knowledge based on previously selected ones, to gradually build a coherent knowledge base (Kim et al., 2020;Zheng et al., 2020).The latter is identified based on its relevance to the task at hand (Qin et al., 2019;Ren et al., 2019;Gao et al., 2019a;Chen et al., 2022Chen et al., , 2023)).This requires the model to have a high ability to understand natural language (Cheng et al., 2023a,b).Liu et al. (2021) propose a three-stage learning framework that includes a controller to dynamically select knowledge during the decoding phase.Meanwhile, Chen et al. (2020) attempt to bridge the gap between prior and posterior knowledge selection.Inspired by these works, we leverage a retrieval and selection framework to dynamically select needed style signals for guiding dialogue generation.
Text Style Transfer is to generate sentence that is both stylistic and preserves the original meaning, which focuses solely on style transfer without considering conversational coherence.With the scarcity of parallel corpus, recent approaches have focused on unsupervised text style transfer, which can be categorized into three main approaches: style disentanglement, prototype editing, and pseudo-parallel corpus construction (Jin et al., 2020).Style disentanglement approaches (Hu et al., 2017;Yi et al., 2020;Zhao et al., 2017;Fu et al., 2017;Li et al., 2020) learn a latent representation of input that separates the content and style information.This representation can be manipulated to perform style transfer while preserving the content.For instance, Fu et al. (2017) use adversarial networks to separate content and style representations, and change the style embedding to perform style transfer.Prototype editing approaches (Li et al., 2018;Madaan et al., 2020;Sudhakar et al., 2019;Wu et al., 2019;Jin et al., 2019a) identify style markers in the sentence and edit them to the target domain to perform style transfer.Pseudo-parallel corpus construction approaches (Jin et al., 2019b;Nikolov and Hahnloser, 2018;Zhang et al., 2018) generate pseudo-stylized data using back-translation for further training.Our approach mainly follows the style disentanglement framework and utilizes back-translation method.

Model Overview
The knowledge-augmented stylized dialogue generation model consists of two main components, the encoder-decoder framework and the featureguided selection module, as shown in Figure 2. We employ separate style encoder and context encoder to encode style and content information, respectively.Inspired by the Liu et al. (2021), we add the style-aware attention and gate network to control the knowledge and context contributions.The model is trained on a dialogue dataset D dia = {(X i , Y i )} M i=1 and a style corpus , where X i is the dialogue context with a relevant response Y i in style s 0 , S i is the i-th sentence in the target style s 1 .Given an input context X i = {w 1 , w 2 , ..., w m } consisting of m words, we aim to learn a model f θ to generate stylized response Ŷi .Here, we propose to first retrieve context-related style knowledge D s (X i ) ⊂ D sty , and then combine them with context X i to generate stylized response Ŷi = f θ (X i , D s (X i )).The opti-mization objectives of the forward model f θ can be represented as follows: where i ∈ {0, 1}.Besides, we have a backward model f ϕ which can generate pseudo context according to the stylized sentence for maximizing the supervision.The difference with previous work is that our backward model is only optimized through (Y i , X i ).This operation is easier to converge, reducing the training overhead.Similarly, the optimization objectives of f ϕ is as follows: (2)

Style Sentences Retrieval
In order to retrieve context-related style knowledge for the input, we apply an information retrieval module with sentence-BERT (Reimers and Gurevych, 2019), which use siamese and triplet network structures to derive semantically meaningful sentence embeddings.Specifically, we use sentence-BERT to encode context X and the style corpus D sty = {S i } N i=1 respectively and calculate the relevance score Rel using cosine-similarity: where Then the most relevant sentences from D sty are retrieved and sent to style encoder to obtain retrieved style hidden, serving as the style knowledge of the input.We also analyze the performance of dynamic retrieval and fixed retrieval in Appendix A.3.

Feature-Guided Selection Module
Inevitably, there exists noisy and irrelevant information in the retrieved style hidden.To address these redundancies and extract dialogue-related style information, we introduce the feature-guided selection module.Specifically, we integrate context information as guidance for our style feature selection process.For clarity, let the context hidden be h c ∈ R Lc×d and the original style hidden of the retrieved style sentences be h s ∈ R Ls×d .The filtered style features obtained from our selection module are denoted as h f , which is constructed utilizing the attention mechanism: where h f ∈ R Lc×d , L c and L s are the length of context and style sentences, respectively.Simply relying on contextual similarity to filter the retrieved style hidden can be ineffective, since the majority of the filtered style features tend to be context-related rather than response-related, reducing their effectiveness in response generation.To address this issue, we examine the relationship between context hidden h c , response hidden h r , retrieved style hidden h s and filtered style feature h f .In Section 3.3.1,we propose response-related contrastive learning to make filtered style feature more response-related by exploring the interaction between h s , h f , and h r .In Section 3.3.2,we introduce style responsiveness Kullback-Leibler loss to reduce the gap between context-guided selection and response-guided selection by exploring the interaction between h c , h r , and h f .

Response-Related Contrastive Learning
We utilize contrastive learning (Gao et al., 2021) to make filtered style features more response-related.Following the intuition that filtered style features h f should be closer to response h r than the retrieved style hidden h s , we design our responserelated contrastive learning to supervise the distance relation.For a mini-batch of N pairs of (h s i , h r i , h f i ), i denotes the i-th sample in the batch.We treat responses as the anchor, the corresponding filtered style features as positive examples, other filtered style features and all retrieved style hidden as negative examples.The responserelated contrastive learning loss is defined as: where τ is the temperature of contrastive learning.

Style Responsiveness Kullback-Leibler
As we rely solely on context to filter retrieved style hidden, a potential gap emerges between contextguided and response-guided selection processes.
To overcome this disparity, we introduce the style responsiveness Kullback-Leibler (SRKL) loss, designed to optimize filtered style features using both context and response.We formulate this problem as an effort to bridge the gaps between the prior and posterior distributions of style selection, utilizing cosine similarity as the importance metric.Specifically, context h c and response h r can assign different importance to different positions of h s = {h 1 s , h 2 s , ..., h Ls s }.The context-guided prior distribution p(s) and response-guided posterior dis-tribution q(s) are calculated as: The KL divergence loss is defined to be: According to Equation 6we can obtain: where Ls i=1 p(s i ) = 1.Hence, we can decompose KL loss into L KL = L dir + L resp as follows: where L dir is the direction loss, and L resp is the responsiveness loss.
Direction Loss.In L dir , the term Ls i=1 p(s i )h i s represents a content-guided reweighted style hidden, where the weight denotes importance.This concept aligns with the objective of our feature-guided selection module.To improve guidance signals, we replace the term with the filtered style feature h f .Intuitively, the difference between the response and context, h r − h c , signifies the necessary style information for the generated response to become stylized.Thus, h T f (h c − h r ) can be deemed as a direction loss guiding the selected style features: Responsiveness Loss.To evaluate the extent of information shared between the selected features and generated response, we define the responsiveness score R(x, y) = max i (x T i y), which examines the maximum information that hidden x can provide to a query y.L resp can be approximated by: From the perspective of responsiveness score, L resp supervises that the retrieved style hidden shares more information with context than with response.However, this supervision offers limited assistance to the feature-guided selection module since our ultimate goal is to enhance the usefulness of h f for response.Therefore, we change the supervised object from h s to h f .Moreover, with the intuition that the filtered style feature h f should share more information with response than context, so we reformulate L resp as: The SRKL loss can be formalized as a weighted sum of direction and responsiveness losses: where λ dir and λ resp are the weight scalars.

Training Objective
The training objective of the forward model can thus be defined as: where i ∈ {0, 1}.The overall training objective can be expressed as: where λ cl , λ SRKL , λ s 0 , λ s 1 , and λ bt are weight scalars.In the experiments, we find that the back translation can be unstable, so we propose a warmup training strategy for the loss L s 1 : where T c , T f , T w represent the current training step, freeze step, and warm-up step respectively.
4 Experimental Setup

Datasets
We conduct our experiments on two benchmarks with four distinct styles: generating formal-like and informal-like responses (TCFC (Wu et al., 2020)) and arXiv-like and Holmes-like responses (Gao et al., 2019b).TCFC focuses on formality with an informal dialogue corpus and a formal sentence corpus.arXiv and Holmes are constructed by the academic website and Sherlock Holmes novel series, respectively.Only the informal dialogues contain context-response pairs, the other styles are nonparallel corpus of sentences.Both tasks share the same informal Reddit conversation dataset.The detailed statistics are given in Appendix A.1.

Compared Baselines
To verify the effectiveness of the proposed method, we compare our model with the pre-trained dialogue generation model DialoGPT, which has not been specifically trained for stylized generation, serving as a baseline for evaluating the performance of models with explicit style transfer capabilities (Zhang et al., 2019).We compare our proposed model with the pipeline method S2S+BT, which first generates a non-stylized response using a S2S approach, and then refines the response by applying a text style transfer model (He et al., 2020).Besides, we also compare our proposed model with recent stylized dialogue generation models: MTask is a vanilla multi-task learning model (Luan et al., 2017) trained on both D dia and D sty .S2S+LM combines the output distribution of a sequence-tosequence (S2S) dialogue model with a stylized language model (LM) (Niu and Bansal, 2018).SFusion is also a multi-task learning model that bridges the gap between conversation modeling and nonparallel style transfer by sharing a structured latent space (Gao et al., 2019b).SRJT is based on the back-translation that incorporates a joint learning strategy and a style routing method to facilitate style transfer in dialogue generation (Zheng et al., 2021).StyleDGPT leverages the pre-trained dialogue model and fine-tuned using a combination of word-level KL loss and sentence-level style classification loss (Yang et al., 2020).MPDL explores the interaction between synthetic and original data within a multi-pass dual learning framework (Li et al., 2021).We also discuss the advantages of our model compared with large language model ChatGPT (OpenAI, 2023) in Appendix A.5.

Implementation Details
We implement our experiments using Pytorch with transformers2 .The main forward model and backward model are initialized with BART base with the learning rate 5e-05.We use grid search to tune the hyper-parameters according to the performance of validation.The temperature hyperparamter τ is set to 0.1 and all coefficients λ except λ s 1 are set to 1.0.
During training, we leverage the AdamW optimizer with batch size 4. We select K = 20 most related style sentences calculated by sentence-BERT3 as input of style encoder.We warm up the backward model with a freeze step of 5,000 and warmup step of 10,000.Due to the instability of back translation, we use beam search for generation with the beam size 50 and filter out the pseudo context that share more than 30% tokens with its response.
5 Experimental Analysis

Main Results
Automatic Evaluation Following the previous work, ROUGE and BLEU metric (Papineni et al., 2002) are employed to measure n-gram overlap between the generated responses and the reference responses for automatic evaluation.Distinct (Li et al., 2015)  and leverage the trained style classifiers BERT and SVM for Informal and Formal styles.The accuracy of the BERT and SVM classifier on the holdout test set are 93.98% and 89.57% respectively for the TCFC.Furthermore, we adhere to the use of the pretrained discriminative model p(S|X) (Yang et al., 2020) for evaluating the Style Intensity (StyleIn.) of the response in Holmes and arXiv styles.
The automatic results are summarized in the Table 1 and Table 2. Overall, our method achieves the highest BLEU-1 and BLEU-2 scores on the four styles, which shows the superiority of our method.For formal response generation task on the TCFC dataset, compared with the previous state-of-the-art algorithm MPDL, our proposed method improves the BLEU-1 and BLEU-2 scores by 1.9 and 0.78.The reasons are two-fold: (1) KASDG can be better aligned in both style and semantics via the style sentences retrieval process.(2) the feature-guided selection module effectively filters out the noise information.We do not achieve the highest score on Dist, mainly due to the negative correlation between BLEU and Dist.Our methods exhibit superior performance in the arXiv style, while only showing comparable results to StyleDGPT in the Holmes style.This can be attributed to the larger corpus of 1,347k sentences in arXiv compared to 38k in Holmes.The increased availability of style sentences contributes to improved retrieval results and a clearer style signal for our model, resulting in better performance.Training bias leads to better performance of KASDG in informal style.The supervised learning is applied to informal style, while formal style requires pseudo context generation.Human Evaluation Apart from automatic metrics, we conduct human evaluations to assess the quality of generated stylized responses in arXiv style.Specifically, we randomly sample 100 examples and employed three experts to grade the quality of generated responses and reference responses by three criteria: (1) Fluency measures the generated responses are readable.(2) Relevance measures the response is coherent with the given context.
(3) Style Consistency exhibits the desired style in the response.For each human indicator, we give the annotators three ranges: [0,0.33] is very dissatisfied; (0.33,0.67) is general satisfaction; and [0.67,1] is very satisfied.All generated summaries are re-capitalized and de-tokenized fairly.
The results are shown in Table 3, which shows that proposed KASDG outperforms the other baseline models in both human metrics.The kappa statistics are 0.58, 0.53 and 0.55 for Fluency, Relevance and Style-Consistency respectively, indicating the moderate agreement between annotators (Landis and Koch, 1977).Specifically, for the response generation in arXiv style, the style consistency and content relevance are significantly higher than the baseline models, indicating that our feature-guided selection module has gained better modeling ability through displayed guidance.

Ablation Study
To better quantify the effectiveness of each module in KASDG, we conduct an ablation study on the Holmes dataset: From the ablations in Table 4, we observe that: (1) The SRKL and CL both play a very important role in the model and their removal will result in varying degrees of performance degradation; (2) The modified SRKL outperforms the original KL in all metrics, which is a good indication that the SRKL constraint model is optimized towards the target style and achieves better style feature.(3) Contrastive learning enhances our model's dialogue capabilities at the expense of style proficiency, aligning with our objective.We incorporate contrastive learning to render filtered style features more response-oriented.As the contrastive loss contains stylized retrieved hidden as negative samples, our contrastive loss primarily targets content fidelity, ensuring that filtered style features are closely related to the response.How-  ever, alterations in the hidden space unavoidably impact style proficiency, explaining the observed decline in style ability upon incorporating the constrastive learning.

The Impact of Feature-Guided Selection Module
Further, we also investigate the impact of featureguided selection module.The hidden states in Holmes style are visualized, including the retrieved style hidden, the response hidden and the filtered style features.We use the output of the hidden layer as the representation of the sequences and utilize t-SNE algorithm (Maaten and Hinton, 2008) for visualization.As shown in Figure 3, we can see that the retrieved style hidden are relatively far from the target responses and the boundary between filtered style features and stylized responses is not quite clear.These phenomena illustrate the effectiveness of our selection module.There exist noise in the retrieved style hidden, but after the selection module the irrelevant information are filtered out, which in turn better aids generation.

SRJT
The first difficulty which we had to contend with was the finding of this american's antecedents.KASDG I would suggest that the professor's conduct should be reported to the police.coherent responses but struggles to capture the desired style and context relevance, likely due to the absence of pre-trained weights.Both SFusion and StyleDGPT generate partially stylized responses but suffer from poor coherence or contextual relevance, possibly due to the instability of the sampling and ranking processes.SRJT effectively generates sentences in the Holmes style but fails to directly address the context, resulting in suboptimal responses.Retrieval represents the retrieved stylized sentence, which includes some stylistic words (e.g., "professor") but lacks a strong contextual connection.In contrast, our model demonstrates superior performance by generating responses that are coherent, contextually relevant, and consistent with the Holmes style.For instance, when discussing the trouble involving teachers, our response refers to "reported to the police", which is highly relevant.Furthermore, the use of "professor" instead of "teacher" along with "conduct" and "reported to the police" are all topical words found in the Sherlock Holmes novel series.Interestingly, the "professor" mentioned in the retrieval process, showcasing the effectiveness of our model.

Conclusion
In this paper, we propose a knowledge-augmented stylized dialogue generation model with a featureguided selection module to provide better guidance signals from both semantic and stylistic perspectives.Specifically, we exploit the style knowledge from a style corpus to provide guiding style signal and design a response-related contrastive learning and a style responsiveness Kullback-Leibler loss to enhance the semantic and stylistic features.Experiments on two public benchmark datasets demonstrate that our proposed method has satisfactory results in generating response with desired style in four target styles.In the future, we plan to explore model performance in zero-shot scenarios and to combine the ability of large language models for intelligent dialogue systems.

Limitations
A main limitation of this work is the availability of data.Although numerous styles exist, the actual usable data is scarce.As with other efforts to generate pseudo-data, we cannot rule out the possibility that pseudo-data generated through backtranslation may compromise the authenticity of the generated text.It is therefore important to encourage users to check the relevance of supplementary generated texts.Furthermore, we do not conduct multiple rounds of experiments with stylized dialogue, which is also a very important capability.Due to limited computing resources, we do not set a large batch size or explore more styles.This is very important in the era of large language models.We plan to explore generating more and higher-quality stylized data with the assistance of a large language model, in order to enhance the performance of the dialogue systems.We will examine these issues more comprehensively in the future research.

Ethics Statement
This paper presents a knowledge-augmented stylized dialogue generation model with the intention to benefit various natural language applications and enhance creative language expression.The datasets used in our study are publicly available and respect privacy standards.In human evaluation, part-time research assistants were fairly compensated, and established evaluation rules were followed.Our approach does not introduce or exacerbate ethical or social biases in generated dialogues.We remain committed to addressing potential ethical concerns and refining our work as needed, adhering to established ethical guidelines and practices within the scientific community.

A.1 Datasets Statistics
Table 6 shows the statistics of datasets, including the number of data and the average length in sentence.The average length of context and response is 12.9, 11.3 respectively.The case study of arXiv style is shown in Table 7, entries highlighted in red point to inconsistencies either in style or content, while those in blue represent a harmonious alignment of both.The responses generated by StyleDGPT and SRJT attempt to incorporate the arXiv style, but they lack coherence with the given context and are semantically unclear.For example, StyleDGPT contains the phrase "time travel speed of the time," which is not meaningful in the given context, while SRJT states that "only a brief description is required", which do not address the topic at hand.MTask generates general and plain responses that do not add any relevant information to the conversation.SFusion is informal and does not adhere to the arXiv style.The Retrieval model generates response that contains stylistic elements such as "exploration" and "direction", but they are not contextually relevant to the given conversation.In contrast, our model is both contextually relevant and consistent with the arXiv style.In our model, the words "conquer" and "universe" demonstrate adherence to the given context, while terms like "position" and "horizons" are consistent with the arXiv style.

A.3 The Impact of Back Translation and Retrieval
In this section, we study the impact of back translation module and different retrieval inputs, as shown in Table 8.The w/o BT do not use back translation, which retrieves the most similar context in the dialogue corpus as its pseudo context for the input style sentence.FixBT C and FixBT R refers to use the pseudo context or the style sentences to retrieve paired style sentences respectively through a pretrained back translation model.From the results, we find that the back translation model plays an important part in our model due to the lack of parallel data.Besides, comparing FixBT C to FixBT R , we find that it is better to use response to retrieve style sentences from style corpus, it is due to the quality of the pseudo context and back translation.
In FixBT R , the style score is very low and tends to copy phrases in context.This is due to the fixed back translation model leads to all the same pseudo context.In contrast, we jointly train the back translation model with our main model, and with the evolution of the back translation model, it can provide dynamical and more diverse pseudo context, thus making our model more robust and relieve the problem of copying.We also compare our proposed model to retrieval model RAG (Lewis et al., 2020) in Holmes style, which is an unsupervised retrieval-augmented method.As the results demonstrate, our approach substantially outperforms RAG in both style and content metrics due to (1) Our framework takes full advantage of dialogue datasets via our jointly loss, thus producing more coherent sentences.(2) Retrieved sentences are noisy.Our proposed feature-guided selection module tackles noise in retrieved sentences and extracts dialogue-related style signals.

A.4 The Impact of Attention
To further analyze the selection module, we visualize the attention in the selection module, as shown in Figure 4.For the same context, arXiv and Holmes focus on different words.arXiv focuses on the academic words (stage, process, fulfilled, contract, and proposal), while Holmes focuses more on the daily life words (requested, business, purchase, living, contracted, and job).The former is commonly seen in formal papers, while the latter appears more realistic and conforms to the feature  of Holmes novel.At the same time, we can observe that the words "sign" and "signature" are more activated in arXiv and less activated in Holmes, which is in line with expectations.

A.5 Analysis of Large Language Model
To further evaluate the performance of our model, we carry out experiments on ChatGPT, a stateof-the-art large language model.Specifically, we design Prompt-1 and Prompt-2 to assess ability of ChatGPT for zero-shot stylized dialogue generation.The results obtained from ChatGPT are summarized in Table 9 (Line 1 -Line 6) and 10.We also take a step forward to evaluate ChatGPT in the setting where examples or explanations of styles are given.For the former, we adopt the classic In-Context Learning algorithm KATE (Liu et al., 2022) with 1,5,10 and 20 examples.For the latter, we provide two new prompts with style explanations in Holmes style, namely Prompt-3 and Prompt-4, we randomly selected 100 tests in Holmes style to carry out our experiments in advanced setting, as shown in Table9 (Line 7 -Line 13).
Prompt-1: """You are a stylized conversation agent.Your task is to generate response in style style.The response should be 1-3 sentences.You only need to output the generated stylized response.Context: {context} Response: """ Prompt-2: """Generate response in style style.The response should be 1-3 sentences.Context: {context} Response: """ Prompt-3: """The Holmes style refers to the deductive and analytical investigative approach pop-

Figure 3 :
Figure 3: Visualizing the retrieved style hidden, stylized responses and filtered style features in Holmes style.

Figure 4 :
Figure 4: The attention visualization of the selection module for Holmes and arXiv styles.

Table 1 :
Automatic evaluation results in the arXiv and Holmes styles.

Table 2 :
Automatic evaluation results in the Formal and Informal styles.
(Zheng et al., 2021)ion of unique n-grams in the generated responses.To evaluate the style intensity, we follow(Zheng et al., 2021)

Table 3 :
Human evaluation results in the arXiv style.

Table 4 :
Ablation Study in the Holmes style.The accuracy of BERT on the holdout test set is 99.4%.
Table 5 presents sample responses generated by different models in the Holmes style.Upon comparing the models, we find that MTask produces

Table 5 :
Examples of the generated responses by KASDG and other models in the Holmes style.

Table 6 :
Dataset statistics."Avg.words"mean the average words length in sentence.

Table 7 :
Case Study in the arXiv style.

Table 8 :
The analysis about the Back-translation and Retrieval methods.