DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation

In this paper, we provide a bilingual parallel human-to-human recommendation dialog dataset (DuRecDial 2.0) to enable researchers to explore a challenging task of multilingual and cross-lingual conversational recommendation. The difference between DuRecDial 2.0 and existing conversational recommendation datasets is that the data item (Profile, Goal, Knowledge, Context, Response) in DuRecDial 2.0 is annotated in two languages, both English and Chinese, while other datasets are built with the setting of a single language. We collect 8.2k dialogs aligned across English and Chinese languages (16.5k dialogs and 255k utterances in total) that are annotated by crowdsourced workers with strict quality control procedure. We then build monolingual, multilingual, and cross-lingual conversational recommendation baselines on DuRecDial 2.0. Experiment results show that the use of additional English data can bring performance improvement for Chinese conversational recommendation, indicating the benefits of DuRecDial 2.0. Finally, this dataset provides a challenging testbed for future studies of monolingual, multilingual, and cross-lingual conversational recommendation.


Introduction
In recent years, there has been a significant increase in the research topic of conversational recommendation due to the rise of voice-based bots (Kang et al., 2019;Sun and Zhang, 2018;Christakopoulou et al., 2016;Warnestal, 2005). These works focus on how to provide recommendation service in a more user-friendly manner through dialog-based interactions. They fall into two categories: (1) task-oriented dialog-modeling approaches with requirement of pre-defined user intents and slots (Warnestal, 2005;Christakopoulou et al., 2016;Sun and Zhang, 2018); (2) non-task dialog-modeling approaches that can conduct more free-form interactions for recommendation, without pre-defined user intents and slots Kang et al., 2019). Recently more and more efforts are devoted to the research line of the second category and many datasets have been created, including English dialog datasets (Dodge et al., 2016;Kang et al., 2019;Moon et al., 2019;Hayati et al., 2020) and Chinese dialog datasets (Liu et al., 2020b;. However, to the best of our knowledge, almost all these datasets are constructed in the setting of a single language, and there is no publicly available multilingual dataset for conversational recommendation. Previous work on other NLP tasks have proved that multilingual corpora can bring performance improvement in comparison with monolingual task setting, such as for the tasks of task-oriented dialog (Schuster et al., 2019b), semantic parsing (Li et al., 2021), QA and reading comprehension (Jing et al., 2019;Lewis et al., 2020;Artetxe et al., 2020;Clark et al., 2020;Hu et al., 2020;Hardalov et al., 2020), machine translation (Johnson et al., 2017), document classification (Lewis et al., 2004;Klementiev et al., 2012;Schwenk and Li, 2018), semantic role labelling (Akbik et al., 2015) and NLI (Conneau et al., 2018). Therefore it is necessary to create multilingual conversational recommendation dataset that might enhance model performance when compared with monolingual training setting, and it could provide a new benchmark dataset for the study of multilingual modeling techniques.
To facilitate the study of this challenge, we present a bilingual parallel recommendation dialog dataset, DuRecDial 2.0, for multilingual and crosslingual conversational recommendation. DuRec-Dial 2.0 consists of 8.2K dialogs aligned across two languages, English and Chinese (16.5K dialogs and 255K utterances in total).
User:Anyway, she's really good. Bot:Do you want to see her movie <The message>? It has refined characters and capricious plots. Figure 1: Illustration of DuRecDial 2.0 with the monolingual, multilingual, and crosslingual conversational recommendation on the dataset. We use different colors to indicate different goals. G, K, X, and Y stands for dialog goal, knowledge, context, and response respectively. difference between DuRecDial 2.0 and existing conversational recommendation datasets. We also analyze DuRecDial 2.0 in-depth and find that it offers more diversified prefixes of utterances and then more flexible language style, as shown in Figure  2(a) and Figure 2(b).
We define five tasks on this dataset. As shown in Figure 1 M onolingual, the first two tasks are English or Chinese monolingual conversational recommendation, where dialog context, knowledge, dialog goal, and response are in the same language. It aims at investigating the performance variation of the same model across two different languages. As shown in Figure 1 M ultilingual, there is another task that is called multilingual conversational recommendation. Here we directly mix training instances of the two languages into a single training set and train a single model to handle both English and Chinese conversational recommendation at the same time. As shown in Figure 1 Crosslingual, the last two tasks are cross-lingual conversational recommendation, where model input and output are in different languages, e.g. dialog context is in English (or Chinese) and generated response is in Chinese (or English).
To address these tasks, we build baselines using XNLG (Chi et al., 2020) 2 and mBART (Liu et al., 2020a) 3 . We conduct an empirical study of the baselines on DuRecDial 2.0, and experiment results indicate that the use of additional English data can bring performance improvement for Chinese conversational recommendation.
In summary, this work makes the following contributions: • To facilitate the study of multilingual and cross-lingual conversational recommendation, we create a novel dataset DuRecDial 2.0, the first publicly available bilingual parallel dataset for conversational recommendation.
• We establish monolingual, multilingual, and cross-lingual conversational recommendation baselines on DuRecDial 2.0. The results of automatic evaluation and human evaluation confirm the benefits of this bilingual dataset for Chinese conversational recommendation.  Table 1: Comparison of DuRecDial 2.0 with other datasets for conversational recommendation. "EN", "ZH", "Dial.", "Utt.", and "Rec." stands for English, Chinese, dialogs, utterances, and recommendation respectively.

Datasets for Conversational Recommendation
To facilitate the study of conversational recommendation, multiple datasets have been created in previous work, as shown in Table 1. The first recommendation dialog dataset is released by Dodge et al. (2016), which is a synthetic dialog dataset built with the use of the classic MovieLens ratings dataset and natural language templates.  creates a human-to-human multi-turn recommendation dialog dataset, which combines the elements of social chitchat and recommendation dialogs. Kang et al. (2019) provides a recommendation dialogue dataset with clear goals, and Moon et al. (2019) collects a parallel Dialog↔KG corpus for recommendation. (Liu et al., 2020b) constructs a human-to-human conversational recommendation dataset contains 4 dialog types and 7 domains, which has clear goals to achieve during each conversation, and user profiles for personalized conversation.  automatically collects a conversational recommendation dataset, which is built with the use of movie data. (Hayati et al., 2020) provides a conversational recommendation dataset with additional annotations for sociable recommendation strategies. Compared with them, each dialogue in DuRecDial 2.0 attaching with seeker profiles, knowledge triples, a goal sequence is parallel in English and Chinese.
Multilingual and Cross-lingual Datasets for Dialog Modeling Dialogue Systems are categorized as task-oriented and chit-chat. Several multilingual task-oriented dialogue datasets have been published (Mrkšić et al., 2017b;Schuster et al., 2019a), enabling evaluation of the approaches for cross-lingual dialogue systems. Mrkšić et al. (2017b) annotated two languages (German and Ital-ian) for the dialogue state tracking dataset WOZ 2.0 (Mrkšić et al., 2017a) and trained a unified framework to cope with multiple languages. Meanwhile, Schuster et al. (2019a) introduced a multilingual NLU dataset and highlighted the need for more sophisticated cross-lingual methods. Those datasets mainly focus on multilingual NLU and DST for task-oriented dialogue and are not parallel. In comparison with them, DuRecDial 2.0 is a bilingual parallel dataset for conversational recommendation. Multilingual chit-chat datasets are relatively scarce. Lin et al. (2020) propose a Multilingual Persona-Chat dataset, XPersona, by extending the Persona-Chat corpora (Dinan et al., 2019) to six languages: Chinese, French, Indonesian, Italian, Korean, and Japanese. In XPersona, the training sets are automatically translated using translation APIs, while the validation and test sets are annotated by human. XPersona focuses on personalized cross-lingual chit-chat generation, while DuRec-Dial 2.0 focuses on multilingual and cross-lingual conversational recommendation.

Dataset Collection
DuRecDial 2.0 is designed to collect highly parallel data to facilitate the study of monolingual, multilingual and cross-lingual conversational recommendation.
In this section, we describe the three steps for dataset construction: (1) Constructing the parallel data item; (2) Collecting conversation utterances by crowdsourcing; (3) Collecting knowledge triples by crowdsourcing.

Parallel Data Item Construction
To collect parallel data, we follow the task design in previous work (Liu et al., 2020b) and use same annotation rules, so parallel data items (e.g., knowledge graph, user profile, task templates, and conversation situation) are essential.
Parallel knowledge graph The domains covered in DuRecDial (Liu et al., 2020b) include star, movie, music, news, food, POI, and weather. As the quality of automatically translated news texts is poor, we remove the domain of news and keep other domains. For the weather domain, we construct its parallel knowledge as follows: 1) decompose Chinese weather information into some aspects of weather(e.g. the highest temperature, the lowest temperature, wind direction, etc.), 2) multiple crowdsourced annotators translate and combine English weather information to generate parallel weather information. For other domains, the edges of knowledge graph are translated by multiple crowdsourced annotators, and the nodes are constructed as follows: • We crawl the English name of movies, stars, music, food, and restaurants from several related websites for the movie 456 /star 456 /music 478 /food 4910 /POI 49 domain. If the English name of at least two websites is the same, it is used to construct the parallel knowledge graph.
• If the English names are different, crowdsourced annotators choose one of the candidate English names crawled above to construct the parallel knowledge graph.
• Otherwise, multiple crowdsourced annotators translate the Chinese nodes into English.
Following these rules, we finally obtain 16,556 bilingual parallel nodes and 254 parallel edges, resulting in about 123,298 parallel knowledge triplets, the accuracy of which is over 97% 11 . Table 2 provides the statistics of DuRecDial 2.0.
Parallel user profiles The user profile contains personal information (e.g. name, gender, age, residence city, occupation, etc.) and his/her preference on domains and entities. The personal information is translated by multiple crowdsourced annotators directly. The preference on domains and entities is replaced based on the parallel knowledge graph constructed above and then revised by crowdsourced annotators.
Parallel task templates The task templates contain: 1) a goal sequence, where each goal consists of two elements, a dialog type and a dialog topic, corresponding to a sub-dialog, 2) a detailed description about each goal. We create parallel task templates by 1) replacing dialog type and topic based on the parallel knowledge graph constructed above, and 2) translating goal descriptions.
Parallel conversation situation The construction of parallel conversation situation also includes two steps: 1) decompose situation into chat time, place and topic, 2) multiple crowdsourced annotators translate chat time, place and topic to construct parallel conversation situation.

Dataset Collection
To guarantee the quality of translation, we use a strict quality control procedure.
First, before translation, all entities in all utterances are replaced based on the parallel knowledge graph constructed above to ensure knowledge accuracy.
Then, we randomly sample 100 conversations (about 1500 utterances) and assign them to more than 100 professional translators. After translation, all translation results are assessed 1-3 times by 3 data specialists with translation experience. Specifically, data specialists randomly select 20% of each translator's translation results for assessment. The assessment includes word-level, utterance-level, and session-level. For word-level assessment, they assess whether entities are consistent with the knowledge graph, whether the choice of words is appropriate, and whether there are typos. For utterance-level assessment, they assess whether the utterance is accurate, colloquial, and has no redundancy. For session-level assessment, they assess whether the session is coherent and is parallel to DuRecDial (Liu et al., 2020b). If the error rate exceeds 10%, translators are no longer allowed to translate. If the error rate exceeds 3%, we will ask translators to fix these errors. After this secondround translation, we will conduct another assessment. In second-round assessment, if the error rate is less than 2%, translators will pass directly, other-wise, they will be assessed for the third time. In the third-round assessment, only the error rate is less than 1% can pass. Finally, we pick 23 translators.
Finally, the 23 translators translate about 1000 utterances at a time based on the parallel user profile, knowledge graph, task templates, and conversation situation. After data translation, data specialists randomly select 10-20% of each translator's translation results for assessment in the same way as above. The translators can continue to translate only after their passing the assessment.

Related Knowledge Triples Annotation
Due to the complexity of this task and the massive knowledge triples corresponding to each dialog, it is very challenging for knowledge selection and goal planning. In addition to translating dialogue utterances, the annotators were also required to record the related knowledge triples if the utterances are generated according to some triples.

Data statistics and quality
Table 2 provides statistics of DuRecDial 2.0 and its knowledge graph, indicating rich variability of dialog types and domains. Following the evaluation method in previous work (Liu et al., 2020b), we conduct human evaluations for data quality. 12 Finally we obtain an average score of 0.93 on this evaluation set.

Prefixes of utterances
Since REDIAL  has been the main benchmark for conversational recommendation, we perform an in-depth comparison between the English part of DuRecDial 2.0 with REDIAL .
As human-bot conversations are very diversified in real-world applications, we expect a richer variability of utterances to mimic real-world application scenarios. Figure 2(a) and Figure 2(b) show the distribution of frequent trigram prefixes. We find that nearly all prefixes of utterances in Redial  are Hello, Hi, and Hey, while the prefixes of utterances in DuRecDial 2.0 are more diversified. For example, several sectors indicated by prefixes Do, What, Who, How, Please, Play, and 12 A dialog will be rated "1" if it wholly follows the instruction in task templates and the utterances are grammatically correct and fluent, otherwise "0". Then we ask three persons to judge the quality of 200 randomly sampled dialogs I are frequent in DuRecDial 2.0 but are completely absent in Redial , indicating that DuRecDial 2.0 has a more flexible language style. 5 Task Formulation on DuRecDial 2.0 i=0 denote a set of dialogs by the seeker s k (0 ≤ k < N s ), where N D k is the number of dialogs by the seeker s k , and N s is the number of seekers. Recall that we attach each dialog (say d k i ) with an updated seeker profile (denoted as P s k i ), a knowledge graph K = {k j } m j=0 , a goal sequence G = {(g ty j , g tp j )} m j=0 , where k j is several knowledge triples, g ty j is a candidate dialog type and g tp j is a candidate dialog topic. Given a context X with utterances {u j } m−1 j=0 from the dialog d k i , G, P s k i and K, the aim is to produce a proper response Y = u m for completion of the goal g c = (g ty m , g tp m ).

Monolingual conversational recommendation:
Task 1: (X en , G en , K en , Y en ) or Task 2: (X zh , G zh , K zh , Y zh ). With these two monolingual conversational recommendation forms, we can investigate the performance variation of the same model trained on two separate datasets in different languages. In our experiments, we train two conversational recommendation models respectively for the two monolingual tasks. Then we can evaluate their performance variation across English and Chinese to see how the changes between languages can affect model performance.
Multilingual conversational recommendation: Task 3: (X en , G en , K en , Y en , X zh , G zh , K zh , Y zh ). Similar to multilingual neural machine translation (Johnson et al., 2017) and multilingual reading comprehension (Jing et al., 2019), we directly mix training instances of the two languages into a single training set and train a single model to handle both English and Chinese conversational recommendation at the same time. This task setting can help us investigate if the use of additional training data in another language can bring performance benefits for a model of current language.

Cross-lingual conversational recommendation:
The two forms of crosslingual conversational recommendation are Task 4: (X zh , G en , K en , Y en ) and Task 5: (X en , G zh , K zh , Y zh ), where given related goals and knowledge (e.g., in Engish), the model takes dialog context in one language (e.g., in Chinese) as input, and then produce responses in another language (e.g., in Engish) as output. Un-   derstanding the mixed-language dialog context is a desirable skill for end-to-end dialog systems. This task setting can help evaluate if a model has the capability to perform this kind of cross-lingual tasks.

Experiment Setting
Dataset For the train/development/test set, we follow the split of (Liu et al., 2020b), with one notable difference that we discard the dialogues that include news. Automatic Evaluation Metrics: For automatic evaluation from the viewpoint of conversation, we follow the setting in previous work (Liu et al., 2020b) to use several common metrics such as F1, BLEU (DLEU1 and DLEU2) (Papineni et al., 2002), and DISTINCT (DIST-1 and DIST-2) (Li et al., 2016) to measure the relevance, fluency, and diversity of generated responses. Moreover, we also evaluate the knowledge-selection capability of each model by calculating knowledge precision/recall/F1 scores as done in Wu et al. (2019); Liu et al. (2020b). 13 In addition, to evaluate rec-ommendation effectiveness, we design two automatic metrics shown as follows. First, to measure how well a model can lead the whole dialog to approach a recommendation target, we design a metric dialog-Leading Success rate (LS ). It calculates the percentage of times a dialog can successfully reach or mention the target after a few dialog turns. 14 Second, to measure how well a model can respond to new topics by users, we design a metric User-Topic Consistency rate (UTC). It calculates the percentage of times the model can successfully follow new topics mentioned by users. 15 Human Evaluation Metrics: The human evaluation is conducted at the level of both turns and dialogs.
For turn-level human evaluation, we ask each model to produce a response conditioned on a given context, goal and related knowledge. The generated responses are evaluated by three persons in terms of fluency, appropriateness, informativeness, proactivity, and knowledge accuracy. 16 For dialogue-level human evaluation, we let each model converse with humans and proactively make recommendations when given goals and reference knowledge. For each model, we collect 30 dialogs. These dialogs are then evaluated by three persons in terms of two metrics: (1) coherence that examines fluency, relevancy and logical consistency of each response when given the current goal and context, and (2) recommendation success rate that measures measures the percentage of times users finally accept the recommendation at the end of a dialog.
The evaluators rate the dialogs on a scale of 0 (poor) to 2 (good) in terms of each human metric except recommendation success rate. 17

Methods
XNLG (Chi et al., 2020) is a cross-lingual pretrained model with both monolingual and crosslingual objectives and updates the parameters of the encoder and decoder through auto-encoding and autoregressive tasks to transfer monolingual NLG supervision to other pre-trained languages. When the target language is the same as the language of training data, we fine-tune the parameters of encoder and decoder. When the target language is different from the language of training data, we fine-tune the the parameters of encoder. The objective of fine-tuning encoder is to minimize: M LM are the same as XNLG, D p indicates the parallel corpus, and D m is the monolingual corpus.
The objective of fine-tuning decoder is to minimize: EAE are the same as XNLG. mBART (Liu et al., 2020a) is a multilingual sequence-to-sequence (Seq2Seq) denoising autoencoder pre-trained on a subset of 25 languages -CC25 -extracted from the Common Crawl (CC) . It 16 Please see supplemental material for more details. 17 Please see supplemental material for more details. provides a set of parameters that can be finetuned for any of the language pairs in CC25 including English and Chinese. Loading mBART initialization can provide performance gains for monolingual/multilingual/cross-lingual tasks and serves as a strong baseline.
We treat our 5 tasks as Machine Translation(MT) task. Specifically, context, knowledge, and goals are concatenated as source language input, which could be monolingual, multilingual, or crosslingual text, then the corresponding response is generated as the target language output. Since the response could be in different languages, we also concatenate a language identifier of response to the source input. Concretely, if the response is in English, the identifier is EN, otherwise ZH, no matter what language the source input is. We finally fine-tune the mBART model on our 5 tasks respectively. Table 3 and Table 4 presents automatic evaluation results on automatic translation 18 parallel corpus and human translation parallel corpus (DuRecDial 2.0). Table 5 provides human evaluation results on DuRecDial 2.0.

Experiment Results
Automatic Translation vs. Human Translation: As shown in Table 3 and Table 4, the models of XNLG (Chi et al., 2020) and mBART (Liu et al., 2020a) trained with human-translated parallel corpus (DuRecDial 2.0) are both better than those trained with machine-translated parallel corpus across almost all the tasks. The possible reason is that automatic translation might contain many translation errors, which increases the difficulty for effective learning by models.
English vs. Chinese: As shown in Table 4 and 5, the results of Chinese related tasks (Task 2, 3(ZH->ZH), 5) are better than that for English related tasks (Task 1, 3(EN->EN), 4) in terms of almost all the metrics, except for F1 and DIST1/DIST2. The possible reason is that: (1) most of entities in this dataset are from the domain of Chinese movies and famous Chinese entertainers, which are quite different from the set of entities in English pretraining corpora used for XNLG or mBART; (2) then the pretrained models perform poorly for the modeling of these entities in utterances, resulting in knowledge errors in responses (e.g., the agent might mention incorrect entities in responses that  Table 3: Automatic evaluation results on parallel corpus of automatic translation. Task 1-5 represent the 5 different tasks on DuRecDial 2.0: (X en , G en , K en , Y en ), (X zh , G zh , K zh , Y zh ), (X en , G en , K en , Y en , X zh , G zh , K zh , Y zh ), (X zh , G en , K en , Y en ), and (X en , G zh , K zh , Y zh ). Task 1 and 2 are monolingual, task 3 is multilingual, and task 4 and 5 are cross-lingual."EN", and "ZH" stands for English, and Chinese respectively.  Table 4: Automatic evaluation results on DuRecDial 2.0. Task 1-5 represent the 5 different tasks on DuRecDial 2.0: (X en , G en , K en , Y en ), (X zh , G zh , K zh , Y zh ), (X en , G en , K en , Y en , X zh , G zh , K zh , Y zh ), (X zh , G en , K en , Y en ), and (X en , G zh , K zh , Y zh ). Task 1 and 2 are monolingual, task 3 is multilingual, and task 4 and 5 are cross-lingual. "EN", and "ZH" stands for English, and Chinese respectively.  Table 5: Human evaluation results on DuRecDial 2.0 at the level of turns and dialogs. "Appro.", "Infor.", "Know. Acc.", "Rec.", "EN", and "ZH" stands for appropriateness, informativeness, knowledge accuracy, recommendation, English, and Chinese respectively. Task 1-5 represent the 5 different tasks on DuRecDial 2.0: (X en , G en , K en , Y en ), (X zh , G zh , K zh , Y zh ), (X en , G en , K en , Y en , X zh , G zh , K zh , Y zh ), (X zh , G en , K en , Y en ), and (X en , G zh , K zh , Y zh ). Task 1 and 2 are monolingual, task 3 is multilingual, and task 4 and 5 are cross-lingual.
are not relevant to current topic), since some entities might never appear in the English pretraining corpora. The accuracy of generated entities in responses is very crucial to model performance in terms of Knowledge P/R/F1, LS, UTC, Know. Acc., Coherence, and Rec. success rate. Therefore incorrect entities in generated responses deteriorate model performance in terms of the above metrics for English related tasks. Monolingual vs. Multilingual: Based on the results in Table 4 and 5, the model for multilingual Chinese task (Task 3(ZH->ZH)) are better than the monolingual Chinese model (Task 2) in terms of almost all the metrics (except for DIS-TINCT and Knowledge Accuracy). It indicates that the use of additional English corpora can slightly improve model performance for Chinese conversational recommendation. The possible reason is that the use of additional English data implicitly expands the training data size for Chinese related tasks through the bilingual training paradigm of XNLG or mBART, which strengthens the capability of generating correct entities for a given dialog context. Then Chinese related task models can generate correct entities in responses more frequently, leading to better model performance.
But the model for multilingual English task (Task 3(EN->EN)) can not outperform the monolingual English model (Task 1). The possible reason is that the pretrained models can not perform well on the modeling of entities in dialog utterances, resulting in poor model performance.
Monolingual vs. Cross-lingual: According to the results in Table 4 and 5, the model of EN->ZH cross-lingual task (Task 5) perform surprisingly better than the monolingual Chinese model (Task 2) in terms of all the automatic and human metrics (except for Fluency) (sign test, p-value <0.05). It indicates that the use of bilingual corpora can consistently bring performance improvement for Chinese conversational recommendation. One possible reason is that XNLG or mBART can fully exploit the bilingual dataset, which strengthens the capability of generating correct entities in responses for Chinese related tasks. Moreover, we notice that the model performance is further improved from the multilingual setting to the cross-lingual setting, and the reason for this result will be investigated in the future work.
But the ZH->EN cross-lingual model (Task 4) can not outperform the monolingual English model (Task 1), which is consistent with the results with the multilingual setting.
XNLG vs. mBART: According to the evaluation results in Table 3, Table 4 and Table 5, mBART (Liu et al., 2020a) outperforms XNLG (Chi et al., 2020) across almost all the tasks or metrics. The main reason is that mBART employs more model parameters and it uses more parallel corpora for training when compared with XNLG.

Conclusion
To facilitate the study of multilingual and crosslingual conversational recommendation, we create a bilingual parallel dataset DuRecDial 2.0 and define 5 tasks on it. We further establish baselines for monolingual, multilingual, and cross-lingual conversational recommendation. Automatic evaluation and human evaluation results show that our bilingual dataset, DuRecDial 2.0, can bring performance improvement for Chinese conversational recommendation. Besides, DuRecDial 2.0 provides a challenging testbed for future studies of monolingual, multilingual, and cross-lingual conversational recommendation. In future work, we will investigate the possibility of combining multilinguality and few (or zero) shot learning to see if it can help dialog tasks in low-resource languages.

Ethical Considerations
We make sure that DuRecDial 2.0 was collected in a manner that is consistent with the terms of use of any sources and the intellectual property and privacy rights of the original authors of the texts. And crowd workers were treated fairly. This includes, but is not limited to, compensating them fairly, ensuring that they were able to give informed consent, and ensuring that they were voluntary participants who were aware of any risks of harm associated with their participation. Please see Section 3 and 4 for more details characteristics and collection process of DuRecDial 2.0.