Multijugate Dual Learning for Low-Resource Task-Oriented Dialogue System

Dialogue data in real scenarios tend to be sparsely available, rendering data-starved end-to-end dialogue systems trained inadequately. We discover that data utilization efficiency in low-resource scenarios can be enhanced by mining alignment information uncertain utterance and deterministic dialogue state. Therefore, we innovatively implement dual learning in task-oriented dialogues to exploit the correlation of heterogeneous data. In addition, the one-to-one duality is converted into a multijugate duality to reduce the influence of spurious correlations in dual training for generalization. Without introducing additional parameters, our method could be implemented in arbitrary networks. Extensive empirical analyses demonstrate that our proposed method improves the effectiveness of end-to-end task-oriented dialogue systems under multiple benchmarks and obtains state-of-the-art results in low-resource scenarios.


Introduction
With the emergence of dialogue data (Zhang et al., 2020b), and the evolution of pre-trained language models (Qiu et al., 2020), end-to-end task-oriented dialogue (TOD) systems (Su et al., 2022;Lee, 2021;Tian et al., 2022) gradually replaced the previous modular cascading dialogue systems (Gao et al., 2018).The end-to-end TOD system adopts a uniform training objective, preventing the error propagation problem in pipelined dialogue systems (Gao et al., 2018).Nonetheless, the end-to-end paradigm requires more training data to perform better (Su et al., 2022).Meanwhile, TOD data is enormously expensive to annotate (Budzianowski et al., 2018) as it simultaneously contains dialogue state tracking, dialogue action prediction, and response generation.It is also expensive to annotate large amounts of complicated dialogue data for * Corresponding Author.
There are 18 colleges I have found , would you prefer one in town centre or in the west ?I'm looking for a college type attraction.
Sure , we have thirteen options , 10 of which are free .May I suggest King's college , or Hughes hall ?I would like to visit on in town centre please.

Dialogue States
Okay , may I have their postcode , entrance fee , and phone number ?

Dialogue States
Can you provide the postcode, entrance fee and phone number ?

Dialogue States
Low-Resource Training Phrase : Prediction : Paraphrasing Sure , the post code to King's college is CB21ST , the entrance fee is free , and phone number 3645351.
Sure , The post code is CB21ST, the entrance fee is free.Miss: King's college, 3645352 Figure 1: The TOD training and prediction procedure in the low-resource scenario.When the user utterance is rephrased, the predictions miss some entities.
each emerging domain (Mi et al., 2022).Therefore, improving data utilization efficiency in lowresource scenarios becomes critical for end-to-end TOD.
Previous approaches (Zhang et al., 2020b;Su et al., 2022) improve the transferability of models on downstream tasks and capacity to handle small samples by conducting self-supervised or semisupervised further-pretraining (He et al., 2022) of models on data from additional dialogue domains.However, these further pre-trains on million-level datasets may require hundreds of GPU hours and are resource-constrained.Then on specific downstream dialogue tasks, a unified multi-task generative paradigm (Lee, 2021;Su et al., 2022) was applied to end-to-end dialogue tasks.Although this generative approach demonstrates better generalization and outcomes, we argue that heterogeneity and duality between data are ignored.Here, heterogeneity refers to the formative discrepancy between uncertain, unstructured discourse (e.g., user utterances and system responses) and deterministic, structured dialogue states.Accordingly, the underlying alignment information and knowledge contained within the heterogeneous data is not fully exploited in the above approach.
To address the above challenges, we propose an innovative multijugate dual learning framework in TOD (MDTOD).Contrary to previous work on reconstructing user discourse based on belief states (Sun et al., 2022;Chen et al., 2020), we observed that modeling the duality between user utterance and system responses can further uncover alignment information of entities between user utterance, system responses and dialogue states.Specifically, the model is required to reconstruct the user discourse based on the dialogue state, and also to deduce the user utterance backwards based on the system response.Consequently, the model can further learn the mapping relationship between the heterogeneous information, and improve the performance of the end-to-end TOD system in lowresource scenarios.
However, proper dual training increases the likelihood of the model learning spurious data correlations.It is evidenced by the fact that comparable model performance can be attained using only highfrequency phrases as the training set (Yang et al., 2022).As a result, the model does not generalize well to test samples with significant expression variations or domain differences, as illustrated in Figure 1.To accomplish this, we expand the oneto-one dual learning paradigm to multijugate dual learning by capitalizing on the property of semantic representation variety.Given a deterministic dialog state as a constraint (Hokamp and Liu, 2017), a specific user utterance (system response) is rewritten into multiple utterances (responses) with the same semantics but various expressions utilizing decoding methods such as beam search or random sampling.Consequently, the richer representation of information permits the spurious correlation of shallow statistical patterns acquired by the model to be effectively mitigated, thereby enhancing the model's generalization (Cui et al., 2019).
Our proposed method exploits the entity alignment information among heterogeneous data by designing a dual learning task; it also mitigates the phenomenon of false correlations and increases the generalization capacity of models via rephraseenhanced multijugate dual learning.As a result, the method does not introduce any additional trainable model parameters.It can be directly integrated into end-to-end TOD systems in arbitrary low-resource scenarios as a training approach to increase data utilization efficiency.We show the effectiveness of our method in several task-oriented datasets, including MultiWOZ2.0(Budzianowski et al., 2018), MultiWOZ2.1 (Eric et al., 2020), and KVRET (Eric et al., 2017).We also demonstrate the advantages of our approach in low-resource scenarios.All code and parameters will be made public.
Our primary contributions are summarized below: • A novel, model-independent, dual learning technique intended for low-resource end-toend TOD systems is presented that can be incorporated directly into the training of any TOD system.
• To address the issue of spurious correlations impacting the generalization of models, a paradigm of paraphrase-enhanced multijugate dual learning is presented.2 Related Work

Task-Oriented Dialogue Systems
TOD aims to complete user-specific goals via multiple turns of dialogue.Prior work focused mainly on TOD subtasks based on the pipeline paradigm (Gao et al., 2018), but it was prone to error propagation between modules.Therefore, recent research has attempted to model dialogue tasks from an endto-end generation approach.DAMD (Zhang et al., 2020a) generates the different outputs of a conversation process via multiple decoders and expands multiple dialogue actions dependent on the dialogue state.A portion of the study (Hosseini-Asl et al., 2020;Yang et al., 2020;Peng et al., 2021) models the individual dialogue tasks in the TOD as cascading generation tasks using GPT2 (Radford et al., 2019) of the decoder architecture as the backbone network.Multi-task approaches (Lin et al., 2020;Su et al., 2022;Lee, 2021) utilizing I would like to find a cheap place to stay that has 4 stars and has free parking.
Context Ct 1 I would like to find a cheap place to stay that has 4 stars and has free parking.

Context Ct 1
Is there any cheap place to stay with 4 stars and free parking ?

Context Ct 2
Is there any cheap place to stay with 4 stars and free parking ?

Context Ct 2
I am looking for a reasonably priced place that is 4 stars and has free parking.

Context Ct 3
I am looking for a reasonably priced place that is 4 stars and has free parking.

MDTOD
I have found 8 places that match, would you like me to book one of them for you ?
Response Rt 1 I have found 8 places that match, would you like me to book one of them for you ?

Response Rt 1
There are 8 places that meet your requirements.should I book one for you?

Response Rt 2
There are 8 places that meet your requirements.should I book one for you?encoder-decoder architectures such as T5 (Raffel et al., 2020) or BART (Lewis et al., 2020) exist for modeling dialogue sub-tasks as sequence-tosequence generating tasks.

Response
Although the methods mentioned above use a uniform end-to-end approach to model TOD, none performs well in low-resource scenarios.To this end, we devise a rephrase-enhanced multijugate dual learning to exploit the entity alignment information more adequately and to obtain more robust performance.

Dual Learning for Generation
Dual learning aims to utilize the paired structure of data to acquire effective feedback or regularization information, thus enhancing model training performance.Dual learning was initially introduced in unsupervised machine translation (He et al., 2016) and combined with reinforcement learning to optimize two agents iteratively.DSL (Xia et al., 2017) then extended dual learning to supervised settings to take advantage of pairwise relationships of parallel corpora.Similar work (Guo et al., 2020) employs cycle training to enable unsupervised mutual generation of structured graphs and text.MPDL (Li et al., 2021) expands the duality in dialogue tasks to stylized dialogue generation without the parallel corpus.A portion of the work (Sun et al., 2022;Chen et al., 2020) integrates the idea of duality into the dialogue state tracking.Some of the work (Zhang et al., 2018;Yang et al., 2018;Cui et al., 2019) introduces dual learning in dialogue generation to enhance responses' diversity, personality, or coherence.However, each method mentioned above requires multiple models or combines reinforcement learning and dual modeling, considerably increasing the task's complexity and training difficulty.
In contrast to previous work, our proposed multijugate dual learning objectives share the same model parameters.It does not require modifications to the original training objectives of the maximum likelihood estimation, making training more straightforward and more readily applicable to other tasks.

End-to-End Task-Oriented Dialogue System
Typically, end-to-end TOD systems consist of subtasks such as dialogue state prediction and response generation (Lee, 2021).End-to-end TOD systems typically model the several subtasks of the dialogue process as sequence generation tasks to facilitate the unification of model structure, and training objectives (Hosseini-Asl et al., 2020).Denote the TOD dataset as D TOD = {Dial i , DB} N i=1 , where DB is the database.In a multi-turn dialogue Dial i , where the user utterance in the t-th turn is U t , and the system response is R t , the dialogue history or dialogue context can be expressed as follows: (1) After that, the model generates the dialogue state B t based on the previous dialogue context C t : where N represents the total number of sessions in the dataset, T i symbolizes the total number of turns per session and θ denotes an arbitrary generation model.The system then searches the database with the criterion B t and retrieves the database result D t .Then, the TOD system generate the response R t based on the context U t , dialogue state B t and database query result D t for each round: Finally, a human-readable response text containing the entity is obtained by combining the belief state and the search results from the database.

Multijugate Dual Learning
This section describes how to design dual learning objectives in the training process of TOD.Also, we expound how to construct multijugate dual learning by paraphrasing user utterances and system responses with representational diversity based on deterministic dialogue states.

Dual Learning in TOD
We define the deterministic dialogue state S t = [B t ; D t ] consisting of two informational components: the belief state B t and the database query results D t .
As illustrated in Figure 2, dialogue states can be viewed as information with a unique manifestation of determinism (Zhang et al., 2020a) without regard to the order of dialogue actions.Utilizing dialogue state as a constraint, the natural language of context and response could be viewed as data with different representations of uncertainty.Therefore, we designed the dual task in TOD to learn the mapping relationship between the utterance of linguistic forms and dialogue state representation.
Let f cb : C t −→ B t denote the forward learning objective of generating belief states according to the context referred to by Eq.2, and f bc : B t −→ C t denote the reverse learning objective of reconstructing the context according to the belief states, then the dual learning task between user utterance and dialogue state is defined as maximizing the following logarithmic probability: Similarly, let f cr : C t −→ R t , f rc : R t −→ C t denote the dual learning task between the dialogue context C t and the system response R t : Accordingly, the loss function of the total dual learning objective is the sum of the above two components: Furthermore, the two dual learning objectives share a set of model parameters in a multi-task paradigm, thus ensuring knowledge transfer between the dual tasks.

Construction of Multijugate Relations
Dual learning enhances data usage efficiency by acquiring additional entity alignment information between heterogeneous data, but it does not lessen the effect of spurious correlations on model generalization.Leveraging the deterministic properties of dialogue states and the uncertainty of linguistic representations, we expand the original one-toone dual learning to multijugate dual learning by paraphrases.Theoretically, several semantically identical but inconsistently expressed contexts or system responses exist for a deterministic dialogue state.Consequently, given (S t , C t ) or (S t , R t ), we rephrase the context C t and the response R t restricted by the entities in dialogue state S t with the following constraint generation method: Specifically, we utilize an off-the-shelf paraphrasing model with the dialogue context C t as the model input.Also the value in the dialogue state S t will be treated as a constraint to limit the decoding.Then, beam search is employed in generation to obtain K different contexts Ct or responses Rt as the result of paraphrase generation.
Moreover, since the context C t of the current turn depends on the dialogue history ) of the previous turn, rewriting the context or responses of each turn results in a combinatorial explosion.Therefore, a heuristic was adopted whereby the dialogue context C t and system response R t would only be rewritten once every dialogue turns.The method for producing the final paraphrase is: where M represents the number of single samples to be rewritten.In practice, as the proportion of training data increases, the number of M decreases.
In addition, paraphrasing was preferred over word substitution or addition/deletion-based techniques (Wei and Zou, 2019) because word substitution is based on a particular probability of word-level alterations, preventing the modification of phrases with false correlation.Moreover, section 4.4.3approved paraphrasing produces more diverse and high-quality augmented content, alleviating the risk of spurious relevance more effectively.

Multijugate Dual Learning for Training
By acquiring paraphrase-enhanced samples, the original one-to-one dual learning can be augmented with multijugate dual learning, allowing the model to completely leverage the entity alignment information between heterogeneous data while maintaining appropriate generalization.The overall framework of our method is illustrated in Figure 2. Consequently, the final loss function for multijugate dual learning of TOD is as follows: 4 Experiments In the context of an end-to-end dialogue scenario, we examine the comprehensive performance of multijugate dual learning on several dialogue datasets, including performance on dialogue state tracking and end-to-end task completion.In addition, evaluation studies were conducted in a scenario with limited resources to assess how effectively dual learning utilizes the knowledge contained within the data.In addition, the impact of several dual learning components and rewriting procedures on the method's overall performance is investigated.

Datasets and Evaluation Metrics
MultiWOZ2.0 (Budzianowski et al., 2018), Mul-tiWOZ2.1 (Eric et al., 2020), and KVRET (Eric et al., 2017), three of the most extensively investigated datasets in the task-oriented dialogue domain, were analyzed.MultiWOZ2.0 is the first proposed dialogues dataset across seven domains, and Multi-WOZ2.1 is the version with several MultiWOZ2.0annotation problems fixed.Following earlier research, we simultaneously evaluate both datasets to assess the robustness of the model against mislabeling.KVRET is a multi-turn TOD dataset containing three domains: calendar scheduling, weather query, and navigation.Detailed statistics of the three datasets are illustrated in Table 7.
For the selection of metrics under the end-to-end dialogue task, we use the standard and widely used Inform, Success, BLEU, and Combined score, where Inform measures whether the system's responses refer to the entity requested by the user, Success measures whether the system has answered all of the user's requests, BLEU measures the quality of the model generation.The Combined score indicates the overall performance of the taskoriented system.It is calculated using the formula: Combined Score = (Inform + Success) * 0.5 + BLEU.For the dialogue state tracking task, the Joint Goal Accuracy (JGA) is applied to quantify the fraction of total turns where the model predicts that all slots in one turn are correct.

Baselines
We did comparison experiments with the following potent baselines.( 1   further learn the alignment information between entities and thus improve the success rate of the task.Meanwhile, T5+DL achieves higher values on BLEU with different proportions of training data, indicating that the dual learning objective between user utterance and system response is also beneficial for improving the quality of text generation.In addition, MDTOD with multijugate dual learning achieves better results, indicating that controlled rephrasing can further enhance the effect of dual learning.

Dual Learning in Dialogue State Tracking
To further investigate the effectiveness of the dual learning task between user utterance and dialogue state on the gain of TOD in multijugate dual learning, we conducted experiments on the Mul-tiWOZ2.0 dataset for dialogue state tracking in low-resource scenarios.We set four different quantitative training sizes of 1%, 5%, 10% and 20% to represent different degrees of low-resource scenarios.
We can infer from the experimental results in  due to that PPTOD performs further pre-training on a large amount of additional dialogue data and thus can achieve relatively better results in extremely low-resource scenarios.Conversely, MDTOD does not perform any additional pre-training, but still achieves the highest accuracy in the case of the other three magnitudes of data, indicating that multijugate dual learning between user utterances and dialogue states is an important component that makes the overall approach effective.

Dismantling multijugate dual learning
To investigate the effect of different dual learning components and paraphrase augmentation on the proposed technique, we conducted ablation experiments by omitting various components using a 10% data size setting.In Table 4, Para represents the approach of paraphrase augmentation, DU-DL represents dual learning between context and dialogue state, and RU-DL indicates dual learning between context and system response.
As shown in Table 4, the model's performance decreases slightly when only dual learning is retained and the paraphrase enhancement is removed, indicating that multijugate dual learning can partially mitigate the overfitting problem caused by pairwise learning and thereby improve the model's generalization capability.Among the various dual  learning components, removing dual learning between context and system responses resulted in a 1.87-point performance decrease, indicating that fully exploiting the implicit alignment information between context and system responses was more effective at enhancing the model's overall performance.Additionally, deleting both dual learning components resulted in a 2.02 points decrease in the combined score, demonstrating that both dual learning objectives are effective for this strategy.

Mitigating Spurious Correlation for Generalization
This section explores the generalizability of dual learning across domains when different numbers of paraphrases are tested, i.e., on a domain that does not appear in the training process, to examine the effect of rephrasing enhanced multijugate dual learning for mitigating spurious correlations of entities and improving generalization.In the In-Car dataset, we explore the ability of MDTOD to generalize to both the scheduling and weather domains separately.The Goal Score is calculated as (inform + success) * 0.5 to signify task accomplishment.As indicated in Table 5, the model exhibits some improvement in task completion rate and text generation performance in both new domains when using rephrased augmented multijugate dual learning.Further, when the number of paraphrases is 2, a boost of 4.21 points is obtained on the Goal Score compared to no additional rephrasing mechanism.This improvement indicates that the multiple conjugations further alleviate the shallow spurious correlations among entities captured by the model, thus improving the task completion rate.

Effect of Different Paraphrases
To investigate the impact of various rephrasing techniques on the construction of multijugate dual learning, we examined the impact of easy data aug- mentation (EDA) (Wei and Zou, 2019), synonym replacement (SYN), and paraphrasing (PARA) to generate augmented data with limited resources.
As demonstrated in the upper part of Figure 3, both PARA and EDA demonstrate minor improvements as the number of augmented data increases, with PARA exceeding EDA.The results indicate that PARA generates higher-quality augmented data, whereas SYN increases noise.
The results in Figure 3 indicate that increasing the number of PARA leads to an increase in the completion rate of dialogue goals.In contrast, EDA and SYN provide a minor boost or decrease in the model's performance.This analysis reveals that a rephrasing strategy enables better discourse rewriting under dialogue state constraints, alleviating the spurious correlation issue and enhancing the model's generalizability.

Conclusion
We propose a novel multijugate dual learning for task-oriented dialogues in low-resource scenarios.Exploiting the duality between deterministic dialogue states and uncertain utterances enables the entity alignment information in heterogeneous data to be fully exploited.Meanwhile, paraphraseenhanced multijugate dual learning alleviates the spurious correlation of shallow pattern statistics.Experiments on several TOD datasets show that the proposed method achieves state-of-the-art results in both end-to-end response generation and dialogue state tracking in low-resource scenarios.

Limitations
Multijugate dual learning improves the model's performance in TOD tasks in low-resource scenarios, but the introduction of the dual training objects increases the required graphics memory and training steps.In addition, the rephrasing mechanism necessitates an additional paraphraser to rewrite the training samples; hence, the number of training samples increases according to the number of paraphrases.Despite this, we find that the higher training cost associated with multijugate dual learning is preferable to employing a large quantity of dialogue data for further pre-training or manually labeling data.
Considered from a different angle, the scenario described above presents possibilities for future research, such as the development of higher-quality rephrasing algorithms to filter the augmented text.In the meantime, multijugate dual learning is a learning objective between structured and unstructured texts.Therefore it may be extended to any task involving heterogeneous data, such as generative information extraction, and data-to-set generation.
Table 8: Results of the performance comparison between MDTOD and other generative models, using Mul-tiWOZ 2.0 and 2.1 datasets, for the dialogue state tracking.†: The results provided in the publications of these approaches could not be reproduced in MultiWOZ2.1 or with an unfair evaluation script, so we corrected these results based on their open source code.
eters or use a more powerful pre-training model for dialogue.Despite this, Dual-Dialog earns the highest results, proving that dual learning can more thoroughly exploit the information included in the original data and enhance the performance of taskoriented dialogue systems despite the vast amount of data.Our proposed strategy likewise achieves the greatest BLEU on MultiWOZ2.0,showing that the quality of the model's generated responses has been substantially enhanced.

B.2 Dialogue State Tracking
To further investigate the influence of bipartite modeling between uncertain user utterances and deterministic belief states in dual learning on TOD systems, we compared MDTOD with different generating paradigm baselines while performing the belief state tracking task.According to Table 8, MD-TOD obtained up-to-date results for both datasets in the belief state tracking challenge.On Multi-WOZ 2.0 and 2.1, our suggested technique achieves a 0.41 JGA improvement above the previous highest BORT and MTTOD.Dual learning between dialogue states and user utterances can learn entity alignment information in the data, resulting in improved performance in belief state tracking.

C Case Analysis
We present partial selections of paraphrases in Table 10 to demonstrate the effect of the rephraser.C1.Did you report the number of parameters in the models used, the total computational budget (e.g., GPU hours), and computing infrastructure used?Section A.1 Figure2: The overall structure of multijugate dual learning.To get paraphrase-enhanced multiple contexts Ct and responses Rt , the contexts and responses in each dialogue turn will be paraphrased based on deterministic dialogue states using an off-the-shelf paraphrase model.Then, the multijugate dual learning is performed between the paraphrase-enhanced contexts Ct and dialogue states and between the paraphrase-enhanced responses Rt and dialogue states, respectively.

Figure 3 :
Figure 3: To investigate the impact of various rephrasing strategies on multijugate dual learning.

Table 1 :
The performance of MDTOD is evaluated at 5%, 10%, and 20% of the data size.Comb.denotes Combined Score.

Table 3 :
DST evaluated at different proportions of low resources.The results are the means and standard deviations of the four runs.

Table 4 :
Different setting of multijugate dual learning.

Table 5 :
The outcomes of the cross-domain evaluation.X / * → * denotes that the * domain is excluded from the training set and only the * domain is tested.
{food : chinese, area : centre, name : lan hong house, time : 12:30, day : sunday, people : 2} Generated Belief State [restaurant] {food : chinese, area : centre, time : 12:30, day : sunday, people : 2} Oracle Reference you are booked for 12:30 on sunday for 2 people .your reference number is f1ntkwzo . is there something else i can help you with ?Delexicalized Response booking was successful .the table will be reserved for 15 minutes .reference number is : [value_reference] .anything else i can help with ?Reference thank you , if you need anything else let me know .good bye .Delexicalized Response thank you for using our system !Lexicalized Response thank you for using our system !

Table 11 :
A dialogue sample generated by MDTOD on MultiWOZ 2.0.A1.Did you describe the limitations of your work?A4.Have you used AI writing assistants when working on this paper?B1.Did you cite the creators of artifacts you used?B2.Did you discuss the license or terms for use and / or distribution of any artifacts?B3.Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided that it was specified?For the artifacts you create, do you specify intended use and whether that is compatible with the original access conditions (in particular, derivatives of data accessed for research purposes should not be used outside of research contexts)?Section 4.1 B4.Did you discuss the steps taken to check whether the data that was collected / used contains any information that names or uniquely identifies individual people or offensive content, and the steps taken to protect / anonymize it?Section 4.1 B5.Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and linguistic phenomena, demographic groups represented, etc.? Section 4.1 B6.Did you report relevant statistics like the number of examples, details of train / test / dev splits, etc. for the data that you used / created?Even for commonly-used benchmark datasets, include the number of examples in train / validation / test splits, as these provide necessary context for a reader to understand experimental results.For example, small differences in accuracy on large test sets may be significant, while on small test sets they may not be.