Beyond Candidates : Adaptive Dialogue Agent Utilizing Persona and Knowledge

,


Introduction
In usual conversations, humans utilize the semantic concept in their minds in terms of the dialogue topic and the preference of the interlocutor.With the semantic-level of concepts, humans communicate each other by aggregating the concepts to convey knowledgeable and empathetic responses (Collins and Quillian, 1969).It implies that people converse by adaptively reorganizing and retrieving additional information with their Human : Ah, it must look very similar to somewhere I have been before.What is this studio known for?Machine's Answer (Ours): As you are interested in filmmaking, you might have heard of this studio before.This studio is known for the first American studio of its kind.

(a) Candidate-Free Conversational Setting
Topic : Norman Studios Persona Candidates : 1) I am interested in filmmaking.
2) I am a dancer.
••• Knowledge Candidates : 1) Norman Studios was an American film studio in Jacksonville, Florida.Founded by Richard Edward Norman, the studio produced silent films featuring African-American casts from 1919 to 1928.••• 2) The roof of this temple was of marble.Many diagrams and reconstructions of this structure show a door in the western side-wall; ••• 3) Since its opening day, the Staples Center has hosted seven NBA Finals series with the Lakers, the 2012 and 2014 Stanley Cup Finals, three WNBA Finals ••• ••• Human : Ah, it must look very similar to somewhere I have been before.What is this studio known for?Machine's Answer (BART): The studio produced silent films from the 1920s to the present.The studio is considered one of the most important silent film studios in the world.semantic concepts, encompassing knowledge and persona, not by relying on pre-defined sources (Young et al., 2018;Zou et al., 2021;Li et al., 2023).

(b) Previous Conversational Setting
It seems that Jang et al. (2022a) and Lim et al. (2022) adhere to this human-like approach on the conversation by referring to persona and knowledge.However, it neglects the humans' semantic concept reconstruction and retrieval capability by requiring pre-defined candidate sets to ground as in Figure 1 (b).As knowledge and persona candidates for the agents are not given in usual conversation, the dependency on the candidates eventually limits their applicability to candidate-free situations as depicted in Figure 1 (a).
To build the dialogue agents adaptive to the candidate-agnostic situation, two branches of studies are conducted.In knowledge-grounded conversation, the knowledgeable agents employ the non-parametric memory-based retrieval to overcome candidate-agnostic situations (Lewis et al., 2020b;Paranjape et al.).Similarly, personaaware dialogue agents consider the out-of-persona situations by extending persona sentences from a few persona concept (Xu et al., 2020;Liu et al., 2022;Li et al., 2023).Even though both streams of research focus on the candidate-agnostic conversational situation, they only leverage a single source for grounding, rather than utilizing both persona and knowledge, simultaneously.
In this paper, we propose a dialogue agent utilizing persona and knowledge that is adaptive to the candidate-free situation.To this end, our method consists of 1) a knowledge-retriever 2) a concept-based persona generator, 3) a dialoguepersona aligner, and 4) a response generator.When the knowledge concept is given, a knowledge retriever finds the relevant knowledge from the knowledge base.Our concept-based persona generator then produces complete sentences with fragmentary persona concepts.The generated persona descriptions are then validated based on the persona aligner regarding both consistency and relevancy.The validated persona descriptions are used as the input of the response generator.
Experimental results show that our candidatefree model outperforms other baselines.Also, we show that the concept-based persona generator and persona aligner boost the performance of the dialogue agents with the ablation studies.We conduct the human evaluation of our model's responses, and the result implies that our method is effective in building a persona-knowledge dialogue agent without candidate sentences.Moreover, we demonstrate that our method is capable of utilizing other dialogue datasets grounding single source, such as PersonaChat (Zhang et al., 2018) or Wizardof-Wikipedia (WoW) (Dinan et al., 2018), and shows the adaptiveness of our proposed model.In qualitative results, it is shown that the generated responses are comparable to the ground truth answers without the given candidates.

Knowledge-grounded Dialogue System
For the informative dialogue generation, Dinan et al. (2018) and Zhou et al. (2018) introduce opendomain dialogue datasets.Various works directly exploit external knowledge to obtain informative responses (Karpukhin et al., 2020;Lee et al., 2021;Wu et al., 2022a) in knowledge-grounded conversation.Other studies exploit augmenting knowledge base to the language model with nonparametric memory-based retriever (Lewis et al., 2020b;Guu et al., 2020;Izacard and Grave, 2021).It is found that a retrieval-augmented generator also reduces hallucination in knowledgegrounded conversation as well (Shuster et al., 2021), and a similar approach recently achieves comparable performance in knowledge-grounded conversation (Paranjape et al., 2021).

Persona-grounded Dialogue System
Also, persona-concentrated datasets have been proposed for constructing persona-engaging dialogue agents (Zhang et al., 2018;Rashkin et al., 2019;Dinan et al., 2020;Smith et al., 2020).While Song et al. (2020) and Wu et al. (2021) focus on injecting persona with utterance post-editing, Zheng et al. (2020) devises the attention routing mechanism for handling persona dialogue.Furthermore, another research takes into account the consistency and relevancy of the persona by employing natural language inference-based critic with a consistency score in reinforcement learning.Moreover, to maintain a consistent persona perceived by the dialogue agent, Bae et al. (2022) use iterative feedback between pre-trained language models (PLMs) and human annotators.
Along with the previous studies, research that attempts to expand the persona sentences to cover candidate-free conversational settings has appeared.In other words, when given fragmentary information on the user's persona, research to complete the insufficient information is conducted using retrieval (Liu et al., 2022;Majumder et al., 2021;Han et al., 2022) or generation (Zhou et al., 2021;Lu et al., 2022).In addition, the persona extension approach leveraging commonsense is introduced.Majumder et al. (2020) expands the given persona sentences by fetching additional information from a commonsense knowledge graph.

Persona and Knowledge Grounded Dialogue System
In recent studies, there has been research on fusing persona and knowledge to generate engaging and knowledgeable responses.(Shuster et al., 2022) models the persona as the long-term memory of both users and chatbot.However, none of these systems cannot be adopted to the candidate-free conversation setting due to the dependency on the given sentenceformed candidates.

Method
We propose adaptive dialogue agents that generate the responses without the persona and knowledge candidates.To this end, we assume that the knowledge and persona concepts are only given to the agent for knowledgeable and engaging responses.First, 1) knowledge retriever retrieves the relevant paragraphs with the knowledge concept, and 2) concept-based persona generator produces the persona descriptions with the given short persona concepts.Then, 3) persona aligner decides whether the generated persona descriptions are relevant to the dialogue history and whether the sentences are consistent with the previous dialogue history.Afterward, 4) response generator provides knowledgeable and engaging responses with the predicted knowledge paragraphs and persona descriptions.

Notation
The given dialogue D is notated as {(u hm 1 , u mc 1 ), ...(u hm n , u mc n )} and where n is the number of rounds.u hm and u mc denote the utterances of human and machines, respectively.The dialogue history H is {(u hm n−w , u mc n−w ), ..., (u hm n−1 , u mc n−1 ), (u hm n )} where w is the window size.The set of given persona sentences P = {p 1 , p 2 ...p |P | } and |P | is the number of persona sentences.Also, C P = {c p 1 , c p 2 , ...c p |P | } indicates the persona concepts whereas C K is a knowledge concept which is a title of the knowledge.

Knowledge Retriever
To let the model adapt to the situation where the knowledge candidates are absent, we use a nonparametric memory-based retrieval.We combine the query encoder and dense vector index, which is obtained from a pre-trained dense passage retriever (DPR) (Karpukhin et al., 2020) for enhanced semantic search.The retriever refers to the knowledge index from the Wikipedia knowledge which is leveraged with FAISS (Johnson et al., 2019) library.Therefore, our retriever R(•) finds the relevant knowledge from the index with the knowledge concept C K by using maximum innerproduct search (MIPS) following Lewis et al. (2020b).The predicted top-k relevant paragraphs are then used as the input for the model and denoted as K.

R( K|C
where e(•) is an embedding from a context encoder, and q(•) is a representation from a query encoder, both implemented with BERT (Kenton and Toutanova, 2019) pre-trained on naturalquestion dataset (Kwiatkowski et al., 2019).

Concept-based Persona Generator
To let the model exploit the semantic concept from the candidate-free situation, we propose a concept-based persona generator to provide complete persona descriptions only with the persona concepts.In detail, our persona generator is pre-trained to generate plausible full persona descriptions with only persona concepts in a retrieve-and-generate manner, following Hashimoto et al. (2018).Then, we freeze the persona generator for the response generation.
For the pre-training process, we first build the persona pool with the collections of unique persona sentences from FoCus (Jang et al., 2022a) and PersonaChat (Zhang et al., 2018).Then, we pre-train the persona retriever using DPR (Karpukhin et al., 2020) and regard highly ranked persona sentences from BM25 (Robertson and Zaragoza, 2009)

Persona Aligner
The persona aligner consists of two modules, i.e., the persona consistency (PC) module and the persona relevancy module (PR).When the generated persona sentences G P are obtained, the persona consistency module predicts whether the generated persona sentences contradict the previous dialogue history H.However, collecting the labels of generated personas' consistency is time-consuming and labor-intensive.Therefore, we distill ChatGPT (OpenAI-Blog, 2022) as model annotators with the BERT-base model (Kenton and Toutanova, 2019) inspired by the high reasoning capability on natural language inference of the ChatGPT (Laskar et al., 2023).We asked the ChatGPT to predict whether the single persona sentence contradicts the given dialogue or not.The prompt for an alignment check is illustrated in Appendix 9.Then, the consistency module trains on the label that ChatGPT provided in a binary classification manner.The trained persona consistency module is then frozen and predicts whether the sentence is consistent with the dialogue history in the inference stage.

ĜP
Different from the persona consistency module, the persona relevancy module takes charge of the relevancy by selecting proper persona sentences that are relevant to dialogue.Even though the persona descriptions do not conflict with the dialogue history, it is still unrevealed the level of relevancy of the persona sentence.For enhanced relevancy prediction, we first separately encode the dialogue and generated persona sentences with the question encoder with DPR, and obtain each hidden state from the last layer.Then, we concatenate the embeddings and pass them into the two linear layers to predict the relevancy of the persona sentences to the dialogue.

ĜP
If the two modules both predict the sentence as relevant and consistent, we assume the sentences are aligned with the given dialogue.

ĜP
We compute the loss as Equation 5.
Baselines (w/ Candidates) Note that l j is the ground-truth label of the j-th example.

Response Generator
With the predicted relevant knowledge passages and persona descriptions, we concatenate them into one sequence along with the concept of the knowledge and dialogue as I = [C K ; H; ĜP CR ; K].Then, we pass into the generative language model to obtain the responses, and the language modeling loss is computed as Equation 6.
where Prob(•) denotes a probability of the generative langauge model, t i is i-th token of target sentence, and T is the number of tokens.The final loss function L F inal is computed as Equation 7 and λ P and λ LM are hyperparameters.
4 Experimental Setup Each selector is implemented with polyencoder (Humeau et al., 2020) and the output is used as the query for the response generator based on RAG (Lewis et al., 2020b).

Evaluation Metrics
The official automatic evaluation metrics for the FoCus benchmark include BLEU (Papineni et al., 2002), chrF++ (Popović, 2017), ROUGE-1, ROUGE-2, and ROUGE-L (Lin, 2004).These metrics are frequently employed to compare machine-generated responses to gold responses in generation tasks.In our experiments on PersonaChat, we also report the performance of the unigram F1 metrics.
For human evaluation, we adopt six metrics on response generation. 1) Informativeness measures the extent of the information conveyed within a response and denotes the degree of providing new, valuable, or relevant details, insights, or facts to the conversation.We also have two criteria for hallucination regarding persona and knowledge.2) Knowledge hallucination is the metric that shows the level of hallucination of generated output that contradicts reality.Similarly, 3) Persona hallucination is the metric that indicates the hallucination level based on the given persona descriptions.Along with the personarelated metrics, 4) Persona relevancy metric denotes how much the given persona directly relates to the ongoing conversation.Moreover, 5) Persona consistency refers to how consistently the persona is maintained in a given dialogue.Lastly, 6) Fluency measures the ability to communicate smoothly, effortlessly, and coherently.Details of our experiments are provided in Appendix B.

Automatic Evaluation
We conduct the experiments on FoCus dataset to show our method's effectiveness without the given candidates.Table 1 demonstrates that our method achieves the second-highest score while the firstranked model directly exploits the persona and knowledge candidate set.In addition, our method outperforms the performance of Jang et al. (2022a) even though it utilizes the candidates from the dataset.Furthermore, all models incorporating our method outperform their vanilla backbone models significantly.

Human Evaluation
We also demonstrate the effectiveness of our method through human evaluations.We recruit nine human workers who have at least bachelor's degree and are proficient in English.We randomly chose 30 dialogues from each datasets.We asked the workers to evaluate the machine-generated responses according to six criteria described earlier.
The score is scaled from 1 to 3, and the results are indicated in Table 2.The results indicate that our method is effective in achieving both persona consistency and persona relevancy.Moreover, our method shows comparable performance in decreasing both knowledge hallucination and persona hallucination in the FoCus dataset.

Ablation Studies
We also conduct ablation studies on our methods with respect to the knowledge retriever, conceptbased persona generator, and persona aligner.
Knowledge Retriever To demonstrate the effectiveness of our knowledge retriever in our model, we compare vanilla backbone which is fine-tuned on the FoCus dataset, and our model.3, incorporating relevant knowledge into the input of the vanilla generative language models enhances the performance of the response generation regardless of the backbone language models.Also, the knowledge retriever enhances the performances consistently, even when the persona generator and aligner are combined in our method.The performance decrease of the models without the knowledge retriever2 suggests that our knowledge retriever is effective.Persona Generator We also compare the performance of models by ablating the type of persona descriptions."GT" refers to our model utilizing ground-truth persona sentences, while "random" indicates the models with five random persona descriptions from the persona pool.As shown in Table 4, the models in random settings exhibit a decrease in performance, regardless of the backbone models.However, the proposed method based on the RAG model outperforms the model that utilizes ground-truth persona descriptions.This suggests that the generated persona descriptions from our concept-based persona generator are comparable to the ground-truth persona sentences and that our concept-based persona generator can replace the labor-intensive human annotating process.(Petroni et al., 2021).

As shown in Table
Persona Aligner We evaluate the impact of our consistency and relevancy module by employing the persona relevancy label."GT" refers to utilizing the ground-truth persona relevancy label in the FoCus dataset, while "random" indicates the model with randomly assigned persona relevancy labels.As shown in Table 5, our method performs comparably to the GT performance when trained on both BART and RAG.However, performance significantly decreases in random settings.This indicates that our persona aligner effectively captures the consistency and relevancy of the generated persona with respect to the dialogue context.

Adaptation to Other Dialogue Datasets
To evaluate the adaptiveness of our method, we conduct experiments on the other dialogue datasets, Wizard of Wikipedia (WoW) and PersonaChat.
Since both datasets consist of dialogues grounding a single source, there are no candidates for the other source.In other words, there are no persona candidates in WoW dataset, and knowledge candidates are absent in PersonaChat.Therefore, we report the results of applying our method to these single-source datasets.Table 6 demonstrates that the models with our method show comparable performances in ROUGE-L.Also, our method based on BART exceeds FiD (Izacard and Grave, 2021) which shows remarkable performances in knowledge-grounded conversation.Also, our BART-based model in PersonaChat also surpasses the P 2 Bot (Liu et al., 2020) according to the unigram-F1 metric.

Knowledge Concepts
Cardiff Bay Barrage

Ours (BART)
Oh yes, it was very large.You'd be a fan of this engineering work, which was considered one of the largest civil engineering projects ever undertaken in the country.

Ours (RAG)
It was very large in scale, and you'd be a fan of this engineering, as it was one of the largest civil engineering projects in the country.

Ground Truth Knowledge
It was one of the largest civil engineering projects in Europe during construction in the 1990s.

Ground Truth Persona
I love the bay area.I have never been to Wales.I am a fan of engineering.I would like to visit Europe.I am not from the United Kingdom.

Ground Truth Response
Oh yes, very large.With you being a fan of engineering, you'd be interested to hear that this was one of the largest civil engineering projects in Europe during the time.Table 7 demonstrates the prediction results from the baselines and our models on the FoCus dataset.It is noteworthy that the vanilla BART and RAG models tend to generate shallow responses that revolve around the topic of Cardiff Bay Barrage.Furthermore, the models tend to provide numerical information that lacks factual support regarding the knowledge concept.To sum up, these models fail to achieve a deep understanding o human preferences based on the provided persona concepts, leading to less engaging responses that lack any personarelated expressions.
Furthermore, our proposed models generate informative and empathetic responses, striking a balance between incorporating external information relevant to the knowledge concept and avoiding any distortion.For instance, our models generate expressions such as "it was one of the largest civil engineering projects in the country," which provide sufficient information.These results suggest that our method is well-suited for scenarios where the sentence-formed knowledge and persona candidates are absent.

Conclusions
In this paper, we introduced an adaptive dialogue agent utilizing persona and knowledge without the given candidates from the dataset.Due to the absence of knowledge candidates, the knowledge retriever retrieves the relevant paragraphs with the knowledge concept from the knowledge base.Also, the concept-based persona generator outputs the persona descriptions with the fragmentary persona concepts from retrieve-and-generate architecture.The generated persona descriptions are then validated through a persona aligner regarding relevancy and consistency.From experiments, we showed that our method is effective even though the persona concept and knowledge concept are given with the dialogue.We also presented the ablation studies on each component of our model.Moreover, we conducted the human evaluation to show the improved quality of the responses of our models and it is also shown in qualitative results.To show its applicability and adaptiveness, we denoted the experimental results of our method on FoCus, WoW, and PersonaChat datasets.

Limitations
Our model deal with response generation in the candidate-agnostic conversation setting, which is the limitation of INFO (Lim et al., 2022) model, proving the possibility of application in the real world.Still, hallucinations regarding personas and knowledge are observed occasionally in the generated responses.However, since the case of hallucinations is a severe problem even in large language models with enormous parameter sizes, it is required for our NLP communities to continue to solve the challenge.Also, although we conducted a human evaluation to validate the diverse capabilities of the proposed model, such as hallucination, consistency, and informativeness in dialogue generation, the number of cases is relatively small for evaluating the entire aspects of the capabilities.Finally, our model demands high GPU computation resources as it marginalizes loss at the token level.
We plan to improve our model for future work by conducting human evaluations with more cases and enhancing the way of qualitative analysis for the model's hallucinated answers.Improving the model's generator with more computationally efficient components is also a desirable direction for the GPU resource issues.

Ethics Statement
We discuss the main ethical considerations of the model we proposed: (1) Privacy. the datasets adopted to construct our model provide factual knowledge and fictional person's preferences, and our model does not contain privacy issues.(2) Human evaluation.During the human evaluation process, we paid human workers the legal wage determined by the average time of evaluation and local labor compensation standards.We also guided them to take a rest when they are in a state of fatigue during work.(3) Potential problems.Although we take conscientious steps to ensure the quality of our models, there can still be potential problems with the generated responses' quality, which can lead to incorrect predictions in applications that leverage factual information and human preferences.(4) Model deployment.Our approach employs the pre-trained language models (PLMs) for the downstream tasks, which have the risk of reflecting the bias of the training data.It is a well-known threat in tasks using PLMs, and we should be careful about social impact when using this method since our model aims to handle factual knowledge.

Figure 1 :
Figure 1: Comparison of conversational settings.(a) is a candidate-free conversational setting and the machine's answer in (a) is a generated response from our model.(b) is a previous conversational setting and the machine's answer in (b) is the response from the BART-large model trained on FoCus dataset.
as negative samples.We then train the generator by considering top k relevant persona sentences P ′ k = {p ′ 1 , p ′ 2 , ..., p ′ k } as positive samples with BART (Lewis et al., 2020a).Our concept-based persona generator provides complete persona sentences G P .The training details are presented in Appendix A.
Information & Communications Technology Planning & Evaluation).This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2020-0-00368, A Neural-Symbolic Model for Knowledge Acquisition and Inference Techniques).This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2022R1A2C1007616)

Table 1 :
Focus Results.Main results on the official validation set.The models are evaluated by generation metrics, including BLEU, chrF++, ROUGE-1 (Dinan et al., 2018)18)contains 22,311 dialogues with 201,999 turns that utilize Wikipedia articles, primarily aimed at facilitating knowledge-based dialogue.In WoW datases, the wizard responds to the apprentice based on selected knowledge.We utilized the test splits for Wizard of Wikipedia (WoW) for the experiments.

Table 3 :
Ablation study on knowledge retriever.K-Retr.denotes the knowledge retriever.

Table 4 :
Ablation study on concept-based persona generator.GT. denotes the ground truth persona sentences which are given from the dataset.

Table 5 :
Ablation study on persona aligner.GT. indicates the ground truth label of persona selection from the dataset.

Table 6 :
Automatic evaluation results on the other dialogue datasets.† denotes the vanilla model and the scores of other models in WoW are imported from KILT benchmark You've never been there before, but this can be found in Wales between the Queen Alexandra Dock and Penarth Head.Human: Was this a large project?

Table 7 :
Qualitative result.‡ denotes the vanilla models, and red and blue each indicate the parts utilized by Ours in the persona and the knowledge.All the predicted results are from our model.