Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address this, we propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever (MAKER) that includes an entity selector to search for relevant entities and an attribute selector to filter out irrelevant attributes. To train the retriever, we propose a novel distillation objective that derives supervision signals from the response generator. Experiments conducted on three standard benchmarks with both small and large-scale knowledge bases demonstrate that our retriever performs knowledge retrieval more effectively than existing methods. Our code has been made publicly available at https://github.com/18907305772/MAKER.


Introduction
When task-oriented dialog (TOD) systems try to accomplish a task such as restaurant reservations and weather reporting for human users, they generally resort to an external knowledge base (KB) to retrieve relevant entity information for generating an informative system response.Conventional pipeline systems comprise several modules such as dialogue state tracking and dialogue policy learning that require annotations for training, where intermediate predictions such as belief state can be used for the retrieval.By contrast, end-to-end task-oriented dialog (E2E-TOD) systems aim to eliminate the dependence on intermediate annotations and generate

&RQGHQVHG
,QGRPDLQ &URVVGRPDLQ (QWLW\) &'1(7 )*6HT ((5 ')1HW Figure 1: Performance of four end-to-end task-oriented dialog systems on MultiWOZ 2.1 when knowledge bases of different sizes are used.The evaluation metric is Entity F1 scores of entities in generated responses."Condensed" means that each dialog is associated with a small-sized knowledge base, which is the default setting of many current systems."In-domain" means that each dialog corresponds to a knowledge base of the same domain, while "Cross-domain" means that all dialogs share the same large-scale cross-domain knowledge base provided in the dataset.
the response end-to-end (Wu et al., 2019).Apparently, knowledge retrieval is at the core of this task, which is non-trivial as no gold labels are available for training a retriever.Arguably, this problem has limited the performance of existing E2E-TOD systems considering that substantial progress has been made in natural language generation.
Roughly, existing approaches for knowledge retrieval in E2E-TOD systems can be divided into three categories.First, the knowledge base can be embedded into a memory network and queried with the representations of dialogue context (Madotto et al., 2018;Qin et al., 2020;Raghu et al., 2021).Second, the serialized knowledge base records can be encoded together with dialog context by pretrained language models (Xie et al., 2022;Wu et al., 2022;Tian et al., 2022).Third, the knowledge base can be embedded into model parameters through data augmentation to support im-plicit knowledge retrieval (Madotto et al., 2020;Huang et al., 2022).These approaches generally blend knowledge retrieval and response generation and train them by the supervision of reference responses, which has two limitations.First, the system response usually consists of pure language tokens and KB-related tokens (e.g., hotel names and phone numbers), and it is challenging to train a good retriever from the weak supervision of reference responses.Second, the systems may become inefficient when the scale of the knowledge base grows large.Our preliminary study 2 in Figure 1 confirms that when a large-scale cross-domain knowledge base is given, existing dialog systems suffer significant performance degradation.
In this paper, we propose a novel Multi-grAined KnowlEdge Retriever (MAKER) for E2E TOD systems to improve the acquisition of knowledge for response generation.The retriever decouples knowledge retrieval from response generation and introduces an entity selector and an attribute selector to select relevant entities and attributes from the knowledge base.Then, the response generator generates a system response based on the dialogue context and the multi-grained retrieval results.The retriever is trained by distilling knowledge from the response generator using the cross-attention scores of KB-related tokens in the response.We train the entity selector, attribute selector, and response generator jointly in an end-to-end manner.
We compare our system with other E2E TOD systems on three benchmark datasets (Eric et al., 2017;Wen et al., 2017;Eric et al., 2020).Empirical results show that our system achieves state-of-theart performance when either a small or a largescale knowledge base is used.Through in-depth analysis, we have several findings to report.First, our retriever shows great advantages over baselines when the size of knowledge bases grows large.Second, of the two selectors, the entity selector plays a more important role in the retriever.Third, our system consistently outperforms baselines as different numbers of records are retrieved, and works well even with a small number of retrieval results.

Related Work
2.1 End-to-End Task-Oriented Dialog Existing approaches for knowledge retrieval in end-to-end task-oriented dialog systems can be 2 More details of this study are given in Appendix B. divided into three categories.First, the knowledge base (KB) is encoded with memory networks, and KB records are selected using attention weights between dialogue context and memory cells.Mem2seq (Madotto et al., 2018) uses multi-hop attention over memory cells to select KB tokens during response generation.KB-Retriever (Qin et al., 2019) retrieves the most relevant entity from the KB by means of attention scores to improve entity consistency in the system response.GLMP (Wu et al., 2019) introduces a global-to-local memory pointer network to retrieve relevant triplets to fill in the sketch response.CD-NET (Raghu et al., 2021) retrieves relevant KB records by computing a distillation distribution based on dialog context.
Second, the concatenation of knowledge base and dialogue context is taken as input for pretrained language models.UnifiedSKG (Xie et al., 2022) uses a unified text-to-text framework to generate system responses.DialoKG (Rony et al., 2022) models the structural information of knowledge base through knowledge graph embedding and performs knowledge attention masking to select relevant triples.Q-TOD (Tian et al., 2022) proposes to rewrite dialogue context to generate a natural language query for knowledge retrieval.
Third, the knowledge base is stored in model parameters for implicit retrieval during response generation.GPT-KE (Madotto et al., 2020) proposes to embed the knowledge base into pretrained model parameters through data augmentation.ECO (Huang et al., 2022) first generates the most relevant entity with trie constraint to ensure entity consistency in the response.However, these methods generally blend entity retrieval and response generation during response generation, which leads to sub-optimal retrieval performance when large-scale knowledge bases are provided.

Neural Retriever
With the success of deep neural networks in various NLP tasks, they have also been applied to information retrieval.One of the mainstream approaches is to employ a dual-encoder architecture (Yih et al., 2011) to build a retriever.Our work is mostly inspired by the retrieval methods in question answering.To train a retriever with labeled questiondocument pairs, DPR (Karpukhin et al., 2020) uses in-batch documents corresponding to other questions together with BM25-retrieved documents as I wasn't planning to stay tonight, but I'm going to have to.Can you help me find a pretty cheap room?
The Cambridge Belfry is in the west of town and in the cheap price range.

Entity Selector
Attribute Selector Figure 2: The overview of our end-to-end task-oriented dialog system, which consists of a knowledge retriever and a response generator.The retriever is further divided into an entity selector and an attribute selector to retrieve multi-grained knowledge, and optimized by distilling knowledge from the response generator.
negative samples for contrastive learning.To train a retriever with only question-answer pairs instead of question-document pairs, which is a weakly supervised learning problem, researchers propose to distill knowledge from the answer generator to train the retriever iteratively (Yang and Seo, 2020;Izacard and Grave, 2020).Other researchers try to train the retriever and generator in an end-to-end manner.REALM (Guu et al., 2020), RAG (Lewis et al., 2020), and EMDR 2 (Singh et al., 2021) propose to train the retriever end-to-end through maximum marginal likelihood.Sachan et al. (2021) propose to combine unsupervised pre-training and supervised fine-tuning to train the retriever.Motivated by these works, we propose a multi-grained knowledge retriever trained by distilling knowledge from the response generator in E2E-TOD systems.

Methods
In this section, we first describe the notations and outline our method, and then introduce the knowledge retriever and response generator in detail.

Notations
Given a dialog D = {U 1 , R 1 , ..., U T , R T } of T turns, where U t and R t are the t-th turn user utterance and system response, respectively.We use C t to represent the dialog context of the t-th turn, where An external knowledge base (KB) is provided in the form of a set of entities, i.e., K = {E 1 , E 2 , ..., E B }, where each entity E i is composed of N attributevalue pairs, i.e., Endto-end task-oriented dialog systems take dialogue context C t and knowledge base K as input and generate an informative response R t .

System Overview
The architecture of our end-to-end task-oriented dialog system is shown in Figure 2. At each turn of conversation, our system resorts to a Multi-grAined KnowlEdge Retriever (MAKER) to retrieve a set of entities from the external knowledge base.Then, the response generator takes as input the retrieved entities together with the dialog context and generates a natural language response.The overall system is optimized in an end-to-end manner without the need for intermediate annotations.
The novelty of MAKER lies in that it decouples knowledge retrieval from response generation and provides multi-grained knowledge retrieval by means of an entity selector and an attribute selector.Specifically, the knowledge base is first encoded with an entity encoder Enc e at entity level.Then, the dialogue context is encoded with a context encoder Enc c and used to retrieve a set of relevant entities from the knowledge base, which is referred to as entity selection.Next, irrelevant attributes are filtered out with an attribute selector based on the interaction of dialog context and retrieved entities, where another encoder Enc a is used.Finally, each retrieved entity is concatenated with the dialog context and passed to a generator encoder Enc g to obtain their representations, based on which the generator decoder Dec g produces a system response.To train the retriever, the cross-attention scores from KB-related tokens in the reference response to each retrieved entity are used as supervision signals to update the entity selector, while the attribute selector is trained by using the occurrences of attribute values in the dialogue as pseudo-labels.
To better measure the relationship between entities and response, the whole training process involves two stages.First, the warming-up stage only trains the attribute selector and the response generator, with the entity selector not updated.As the above training converges, the second stage starts to update the entity selector together with other modules using cross-attention scores from the response generator.

Knowledge Retriever
In this section, we introduce the entity selector, attribute selector, and the training of the retriever.
Entity Selector To support large-scale knowledge retrieval, we model the entity selector as a dual-encoder architecture, where one encoder Enc c is used to encode the dialogue context and another encoder Enc e is to encode each entity (row) of the knowledge base, both into a dense vector.To encode an entity, we concatenate the attribute-value pairs of this entity into a sequence and pass it to Enc e .The selection score s t,i for entity E i is defined as the dot product between the context vector and the entity vector as: (1) Then, the top-K entities are obtained by: Retrieving the top-K entities can be formulated as maximum inner product search (MIPS), which can be accelerated to sub-linear time using efficient similarity search libraries such as FAISS (Johnson et al., 2019).We implement Enc c and Enc e with a pre-trained language model and allow them to share weights, where the final "[CLS]" token representation is used as the encoder output.Existing studies suggest that initializing Enc c and Enc e with BERT weights may lead to collapsed representations and harm the retrieval performance.Therefore, following KB-retriever (Qin et al., 2019), we initialize them by pre-training with distant supervision. 3 Since the entity selector is updated by knowledge distillation, recalculating the embeddings of all entities after each update introduces considerable computational cost.Therefore, we follow 3 More pre-training details are given in Appendix C.
EMDR 2 (Singh et al., 2021) to update the embeddings of all entities after every 100 training steps.
Attribute Selector To remove irrelevant attributes and values from the retrieved entities for finer-grained knowledge, we design an attribute selector as follows.We first concatenate dialog context C t with each entity E i ∈ E t and encode them with an attribute encoder Enc a , which is also a pretrained language model.Then, the final "[CLS]" token representation of Enc a is extracted and mapped into a N -dimensional vector by a feed-forward network (FFN) for attribute scoring: where each element in a t,i ∈ R N represents the importance of the corresponding attribute.Note that a t,i only measures the importance of attributes in E i .To obtain the accumulated importance, we calculate the sum of a t,i over all retrieved entities weighted by entity selection score s t,i : where σ represents the sigmoid function.
Finally, the attributes whose importance scores in a t are greater than a pre-defined threshold τ are selected to construct an attribute subset.The retrieved entities clipped with these attributes are treated as multi-grained retrieval results denoted by Êt .Specifically, we obtain Êt by masking irrelevant attribute-value pairs in each retrieved entity of E t .
To train the attribute selector, we design an auxiliary multi-label classification task.The pseudolabel is a N -dimensional 0-1 vector b t constructed by checking whether any value of an attribute in Êt appears in dialogue context C t or system response R t .Then, we define a binary cross-entropy loss L att for this classification task as: Updating The entity selector is updated by distilling knowledge from the response generator as supervision signals.Specifically, since only KBrelated tokens in the response are directly connected to the knowledge base, we regard the crossattention scores from these tokens to each retrieved entity as the knowledge to distill.The rationality behind this is that the cross-attention scores can usually measure the relevance between each entity and the response.Supposing response R t contains M KB-related tokens, we denote the crossattention scores from each KB-related token to entity Êi by C t,i ∈ R | Êi |×M ×L , where | Êi | represents the number of tokens in Êi and L is the number of decoder layers.Then, we calculate an accumulated score for entity Êi as: Then, ĉt,i is softmax-normalized to obtain a crossattention distribution c t over the K retrieved entities to reflect their importance for the response.Finally, we calculate the KL-divergence between the selection scores s t of retrieved entities and cross-attention distribution c t as the training loss:

Response Generator
Inspired by Fusion-in-Decoder (Izacard and Grave, 2020) in open-domain question answering, we employ a modified sequence-to-sequence structure for the response generator to facilitate direct interaction between dialog context and retrieved entities.
Generator Encoder Each entity Êi in Êt is first concatenated with dialog context C t and encoded into a sequence of vector representations H t,i : where Enc g represents the encoder of the response generator.Then, the representations of all retrieved entities are concatenated into H t : Generator Decoder Taking H t as input, the generator decoder Dec g produces the system response token by token.During this process, the decoder not only attends to the previously generated tokens through self-attention but also attends to the dialogue context and retrieved entities by cross-attention, which facilitates the generation of an informative response.The probability distribution for each response token in R t is defined as: We train the response generator by the standard cross-entropy loss as: where |R t | denotes the length of R t .
Lastly, the overall loss of the system is the sum of entity selection loss L ent , attribute selection loss L att , and response generation loss L gen : (13)

Discussions
Although deriving much inspiration from opendomain question answering (QA) (Izacard and Grave, 2020), where the labels for retrieval are also not available, the scenario of this work is quite different.One major difference is that the answer in open-domain QA is completely from the external source of knowledge, while some responses and tokens in dialog systems may not be relevant to the external knowledge base.That means dialog systems need to accommodate both dialog context and external knowledge and generate a fluent and informative natural language response, making this task thornier than open-domain QA.The main differences between our MAKER and existing knowledge retrieval methods in task-oriented dialog systems are twofold.First, MAKER decouples knowledge retrieval from response generation and provides multi-grained knowledge retrieval of both entities and attributes.The retrieval results are explicitly passed to the generator to produce a system response.Second, MAKER is trained by distilling knowledge from the response generator for supervision, which varies from existing attention-based approaches.

Datasets
We evaluate our system on three multi-turn task-oriented dialogue datasets: MultiWOZ 2.1 (MWOZ) (Eric et al., 2020), Stanford Multi-Domain (SMD) (Eric et al., 2017), and CamRest (Wen et al., 2017).Each dialog in these datasets is associated with a condensed knowledge base, which contains all the entities that meet the user goal of this dialog.For MWOZ, each condensed knowledge base contains 7 entities.For SMD and CamRest, the size of condensed knowledge bases is not fixed: it ranges from 0 to 8 with a mean of 5.95 for SMD and from 0 to 57 with a mean of 1.93 for CamRest.We follow the same partitions as previous work (Raghu et al., 2021).The statistics of these datasets are shown in Appendix A.
BLEU (Papineni et al., 2002) and Entity F1 (Eric et al., 2017) are used as the evaluation metrics.BLEU measures the fluency of a generated response based on its n-gram overlaps with the gold response.Entity F1 measures whether the generated response contains correct knowledge by micro-averaging the precision and recall scores of attribute values in the generated response.

Implementation Details
We employ BERT (Devlin et al., 2019) as the encoder of our entity selector and attribute selector, and employ T5 (Raffel et al., 2020) to implement the response generator.All these models are finetuned using AdamW optimizer (Loshchilov and Hutter, 2018) with a batch size of 64.We train these models for 15k gradient steps with a linear decay learning rate of 10 −4 .We conduct all experiments on a single 24G NVIDIA RTX 3090 GPU and select the best checkpoint based on model performance on the validation set.More detailed settings can be found in Appendix E.

Baselines
We compare our system with the following baselines, which are organized into three categories according to how they model knowledge retrieval.
Implicit retrieval: These approaches embed the knowledge base into model parameters by data augmentation to provide implicit retrieval during response generation, including GPT-2+KE (Madotto et al., 2020) and ECO (Huang et al., 2022).

Results and Analysis
In this section, we first show the overall performance of the evaluated systems given a condensed knowledge base for each dialog.Then, we compare them with a more practical setting in which a largescale knowledge base is provided.We also conduct an in-depth analysis of the proposed retriever.More experiments are presented in the appendix.

Overall Results
The overall results are shown in Table 1.We observe that our system with T5-Large as the backbone model achieves the state-of-the-art (SOTA) performance on MWOZ and SMD.Specifically, on MWOZ our system surpasses the previous SOTA, namely Q-TOD, by 1.15 points in BLEU and 4.11 points in Enity F1.On SMD, the improvements over Q-TOD are 4.58 points in BLEU and 0.19 points in Enity F1.On CamRest, our system only achieves the best performance in BLEU but underperforms the best-performing DialoKG slightly.The reason behind this phenomenon is that many dialogues in CamRest contain extremely small knowledge bases, with only 1-2 entities, leaving little space for our retriever to show its advantage.
Note that with the same backbone generator (T5-Base/T5-Large), our system surpasses Q-TOD even though it relies on human annotations to train a query generator for knowledge retrieval.The possible reason is that while the retriever of Q-TOD is independent of response generation, ours is trained and guided by knowledge distillation from response generation.Moreover, in addition to retrieving entities from the knowledge base, our retriever also conducts a fine-grained attribute selection.

Large-Scale Knowledge Base
The experiments in Section 5.1 are conducted with each dialog corresponding to a condensed knowledge base.Although most previous systems are evaluated in this setting, it is not practical to have such knowledge bases in real scenes, where the systems may need to retrieve knowledge from a largescale knowledge base.Therefore, we examine the performance of several well-recognized E2E TOD systems by implementing them on a large-scale cross-domain knowledge base (referred to as "full knowledge base") on MWOZ and CamRest, respectively, where the knowledge base is constructed by gathering the entities for all dialogs in the original  (Qin et al., 2019), (Qin et al., 2020), (Raghu et al., 2021), and (Tian et al., 2022), respectively.dataset. 4The results are shown in Table 2.We observe that our system outperforms baselines by a large margin when the full knowledge base is used.In addition, there are two other observations.First, comparing the results in Table 1 and  Table 2, we note existing systems suffer a severe performance deterioration when the full knowledge base is used.For example, the Enity F1 score of DF-Net drops by 7.79 points on MWOZ, while our system only drops by 2.81/2.6 points.Second, our system with the full knowledge base still outperforms other systems when they use condensed knowledge bases, which is easier to retrieve.These   3: Results of ablation study on MWOZ with T5-base, where "w/o" means without, "distillation" denotes distillation from response generation, "attr_selector" denotes the attribute selector, and "ent_selector" denotes the entity selector.

Model
observations verify the superiority of our system when applied to a large-scale knowledge base as well as the feasibility of applying it to real scenes.

Ablation Study
We conduct an ablation study of our retriever MAKER with both condensed and full knowledge bases on MWOZ, and show the results in the first and the second blocks of Table 3, respectively.
When condensed knowledge bases are used, the system suffers obvious performance drops with the removal of distillation (w/o distillation) or entity selection (w/o ent_selector).This indicates that despite the quality of condensed knowledge bases, our retriever can further learn to distinguish between  the entities by distilling knowledge from the response generator.Besides, the performance of the system drops when the attribute selector is abandoned (w/o attr_selector), showing that attribute selection is also indispensable in the retriever.
When the full knowledge base is used, entity selection is more necessary for the system.Therefore, we only ablate the distillation component and the attribute selector.The results show that the system suffers significant performance degradation when distillation is removed (w/o distillation).Attribute selection is also shown important as the performance drops upon it is removed (w/o attr_selector).

Comparison of Retrieval Methods
To further demonstrate the effectiveness of our multi-grained knowledge retriever, we compare different retrieval methods on the full knowledge base of MWOZ.Specifically, we first retrieve the top-K entities with different retrieval methods and employ the same response generator to generate the system response.Moreover, we propose a new metric, i.e., Recall@7, to measure whether the suggested entities in the system response appear in the 7 retrieved entities.As shown in Table 4, the proposed retriever achieves the best performance compared with other methods except Oracle, which uses condensed knowledge bases without retrieval, in both generation metrics (BLEU, Entity F1) and the retrieval metric (Recall@7).
To investigate the effect of different numbers of retrieved entities on system performance, we report the Entity F1 and Recall@x scores of the above retrieval methods as the number of entities changes, while Oracle is not included because we cannot rank its entities.We observe in Figure 3(a) that the Recall@x scores for all methods improve as the number of entities grows, while our retriever consistently achieves the best performance.In Figure 3(b), we observe no positive correlation between the Entity F1 score and the number of entities, suggesting that noisy entities may be introduced as the number of entities increases.We can also observe that the number of entities corresponding to the peak of the Entity F1 scores varies for different methods, while our retriever only requires a small number of entities to reach the peak performance.

Attribute Selection Methods
In Section 3.3, we calculate an accumulated importance score for each attribute weighted by entity selection scores to determine which attributes are preserved based on a given threshold.In Table 5, we compare different methods for accumulating the attribute scores as well as different approaches for filtering out irrelevant attributes.It can be observed that direct averaging rather than weighting by entity selection scores hurts the Entity F1 score.This indicates that the retriever can select attributes more appropriately based on the selection scores of retrieved entities.We also observe that using top-K instead of a threshold to select attributes leads to a lower Entity F1 score than preserving all attributes.We believe the reason is that the number of attributes to be selected varies for each dialogue context, and therefore simply selecting the top-K attributes results in sub-optimal attributes.

Conclusion
We propose a novel multi-grained knowledge retriever (MAKER) for end-to-end task-oriented dialog systems.It decouples knowledge retrieval from response generation and introduces an entity selector and an attribute selector to acquire multigrained knowledge from the knowledge base.The retriever is trained by distilling knowledge from the response generator.Empirical results show that our system achieves state-of-the-art performance when either a small or a large-scale knowledge base is provided for each dialog.Through in-depth analysis, our retriever shows great advantages over baselines when the size of knowledge bases grows large.Of the two selectors, the entity selector is shown to be more prominent in the retriever.

Limitations
Our system employs a modified sequence-tosequence architecture to implement the response generator.Since the length of dialogue context increases as the dialogue continues, the generator needs to input multiple long dialogue contexts to the encoder simultaneously, each for a retrieved entity.This may cause redundancy in the input and lowers the proportion of KB-related information.
We will explore more efficient architectures for the response generator in future work.

D Domain-Wise Results
We report the domain-wise results with condensed knowledge bases on MWOZ and SMD in Table 9 and Table 10, respectively.The results of baseline models are cited from (Raghu et al., 2021), (Rony et al., 2022), and (Tian et al., 2022).

E More Implementation Details
The hyperparameters of our system with condensed and full knowledge bases are shown in Table 11 and Table 12, respectively.Our method has three contributions: knowledge distillation, entity selection, and attribute selection.We list the application of these contributions with condensed and full knowledge base in Table 13 and Table 14, respectively.

F Case Study
In Figure 4, we provide a dialogue example from the MWOZ dataset.We can observe that, for a given user utterance, our system can retrieve entities that satisfy the user goal, while masking irrelevant attributes.Then, it generates appropriate system responses.Note that when the user goal changes, e.g., in the second turn of this case when the user wants a cheap restaurant, our retriever can retrieve the corresponding one, with the attribute of price range being preserved.

Figure 3 :
Figure 3: Performance of different retrieval methods as the number of retrieved entities changes on the full knowledge base in Recall (a) and Entity F1 (b) scores.

Figure 4 :
Figure 4: An example of dialogue to illustrate our system.Blue font refers to knowledge base-related information.

Table 1 :
Overall results of E2E TOD systems with condensed knowledge bases on MWOZ, SMD, and CamRest.The best scores are highlighted in bold, and the second-best scores are underlined.†, ‡, §, * indicates that the results are cited from

Table 2 :
Overall results of E2E TOD systems with a large-scale knowledge base on MWOZ and CamRest, respectively.The best scores are highlighted in bold, and the second-best scores are underlined.

Table 4 :
Comparison of different retrieval methods on the full knowledge base.Oracle refers to using the condensed knowledge base for each dialog as the retrieval result.Frequency means measuring the relevance by the frequency of attribute values occurring in the dialogue context.BM25 measures the relevance using the BM25 score between dialogue context and each entity.

Table 8 :
Hyperparameter setting for pre-training our entity selector on the full knowledge base of MWOZ and CamRest datasets, respectively.

Table 14 :
Hyperparameter settings of whether to apply each contribution to our system when the full knowledge base is used on MWOZ and CamRest.for a restaurant.The restaurant should be in the north and should serve italian food.Da vinci pizzeria at 20 milton road chesterton.Da vinci pizzeria is located at 20 milton road chesterton.