Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation

Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.


Introduction
Conversational Recommendation System (CRS) is a growing research topic and application area, along with the recent advance in Natural Language Processing (NLP) and Conversational techniques.In contrast to traditional recommendation approaches, which provide item suggestions in a non-interactive manner, conversational recommendation involves multi-turn and mix-initiative interactions between users and the system (Jannach and Chen, 2022).Hence, a growing body of research studies (Zhang et al., 2018b;Chen et al., 2019;Li et al., 2022) has introduced diverse conversational recommendation models, which address the natural language understanding from user utterances, user intent estimation, user preference estimation and generation of appropriate responses that encapsulate the recommended items.Hence, in the development of conversational recommenders, we identify two primary streams of research directions: optimising natural responses and improving recommendation results.While recent advancements in language models have proven effective in generating natural utterances (Wang et al., 2022), enhancing recommendation results based on user preferences inferred from their utterances remains a challenge.These include skewed data interactions with limited coverage and missing evaluation based on the multi-turn interaction nature of CRS.In addition, due to the nascent nature of conversational recommendation techniques, there has been limited investigation into uncovering the potential biases within the feedback loop of their development.Left unchecked, the biases could lead to unfair information presentation, ineffective explorations, and poor decisions.
To address this research gap, we first characterise various potential biases in conversational recommendation systems, including selection bias in data and popularity-oriented biases based on the multiturn interaction nature of CRS.Through a detailed analysis of CRS models, we identify these biases and their impact.Subsequently, by leveraging the advancements in Large Language Models (LLMs) and in-context learning approaches, we introduce novel data augmentation techniques, namely Once-Aug and PopNudge, to enhance the recommendation results and mitigate the effects of biases.Figure 1 provides an overview of the data augmentation pipeline that we applied in this study, which illustrates the generation of synthetic data followed by the application of our novel augmentation strategies.These strategies enrich the training data, leading to improved performance of CRS models.To validate the effectiveness of our data augmentation strategies, we conduct extensive experiments using popular CRS baselines, such as KGSF (Zhou et al., 2020a) and KBRD (Chen et al., 2019), on two benchmark datasets: ReDial (Li et al., 2018) and TGReDial (Zhou et al., 2020b).The experimental results consistently demonstrate the performance improvement achieved by our data augmentation strategies, particularly PopNudge.Specifically, we evaluate the improvement based on recommendation accuracy, selection bias as well as various popularity-oriented biases and effects.
The main contributions of this paper are threefold: (1) a comprehensive investigation into potential biases and effects affecting the performance of CRS models, (2) novel data augmentation strategies for improving CRS models from multiple perspectives, and (3) extensive experiments demonstrating the effectiveness of the proposed data augmentation strategies and providing additional insights with discussion on novel biases and effects.

Related Work
This section offers an overview of the biases from data collection and the development of recommendation systems to the conversational recommendation system.In addition, we further discuss the existing bias mitigation strategies that leverage generative data or so-called data imputation.

Biases in Recommendation System
A recommendation model is designed to suggest items of high interest to a given user based on their preferences learned from historical interactions and associated information in the collected data.However, when examining the feedback loop involving the interactions between the user, data, and the model, it becomes apparent that recommender systems can suffer from various biases (Chen et al., 2022;Mehrabi et al., 2021;Wang et al., 2023;Salutari et al., 2023;Liu, 2023).For example, biased interactions between users and items, known as selection bias (Hernández-Lobato et al., 2014;Steck, 2013)) or the biased representation of items (i.e., exposure bias (Liu et al., 2020)) can result in skewed data distribution that makes it challenging to accurately capture user interests and ensure fair item representations.In particular, the commonly inves-tigated popularity bias (Zhao et al., 2022;Naghiaei et al., 2022a), which arises from the concern about the growing applications of recommendation systems and the Matthew effect (cf.Wang et al., 2018) that reinforce interactions with popular items.Consequently, several strategies have been proposed to mitigate or alleviate bias effects in developing recommendation techniques (Wang et al., 2021;Liu et al., 2021;Naghiaei et al., 2022b).For example, Wang et al. (2021) addresses selection bias by validating the learned model on a small set of unbiased item ratings to enforce unbiased rating predictions.
In the context of Conversational Recommendation Systems (CRSs), which are the focus of this study, there is a growing concern about biases within the feedback loop, but limited research has been conducted to explore and mitigate these biases (Gao et al., 2021).Similar to the concerns regarding popularity bias in conventional recommendation systems, Lin et al. (2022a) and Fu et al. (2021) have investigated and contributed to mitigating popularity bias in CRSs.In addition to popularity bias, Shen et al. (2023) have explored the impact of unintended biases, such as racial and gender biases, on CRS recommendations.However, existing studies have relied on semi-synthetic conversations generated using user-posted reviews and developed using the system ask-user response method (Zhang et al., 2018b), rather than using real recommendation-oriented conversations to examine and evaluate bias effects.As a result, research on the effect of potential biases on CRS development remains underexplored, lacking the use of real user-involved conversation data.

Generative Data for Bias Mitigation
In previous literature, one commonly applied solution to address biases caused by missing data is data imputation (Hernández-Lobato et al., 2014;Steck, 2013).Data imputation estimates and generates pseudo-labels or ratings for users' unseen items, aiming to mitigate the selection bias, resulting from unfair data collection strategies.However, the performance of bias mitigation strategies based on data imputation is hindered by the issue of imputation inaccuracy, which can introduce larger biases (Wang et al., 2019).In the field of machine learning, generative neural models have advanced significantly, and the use of the generative data has become widespread for developing debiased training data (Wu et al., 2022;Lee et al., 2021).
Consequently, generative data holds the potential to mitigate biases in CRS models but has not been fully explored in previous experiments.

Bias Analysis, Formulation and Mitigation for CRS
Here we formally define the conversational recommendation task and its associated components, susceptible to various biases originating in the feedback loop.We then propose to consider the impact of multiple biases, and account for the unique characteristics of CRSs.Finally, we introduce two novel data augmentation strategies, Once-Aug and PopNudge, to mitigate the impact of biases.

Problem Statement
A Conversational Recommendation System (CRS) is an extension of conventional recommendation systems that enables interactive and personalised recommendations through multi-turn interactions and user feedback collection.The goal of a CRS is to effectively understand users' personal preferences and immediate intent for item exploration in order to provide natural responses that include accurate and personalised recommendations.Formally, at a specific turn t, for a given user u among M users (i.e., U = {u 0 , u 1 , ..., u M }), a CRS model utilises information from the previous conversational utterances (i.e., d = {c 0 , c 1 , c 2 , ..., c t−1 }), from a dialogue corpus D, to generate a ranked list of items r t from an item corpus I = {i 0 , i 1 , ..., i N } and formulates a natural language response a t .In this study, we introduce the concept of a "conversational episode", d e , inspired by reinforcement learning, which represents the conversations from when a user initiates a conversation until the recommendation is accepted by the user or the maximum interaction turns of the agent are reached (Sun and Zhang, 2018;Chu et al., 2023).The index e denotes a specific episode.Consequently, we investigate the bias effects on conversational recommendations at different levels, including individual turns of recommendations (r t ), recommendations based on the context of a conversational episode (r e ), recommendations considering historical conversations r t|<t , previous episodes r e|<e and corpus-level biases (r D ).By examining the bias effects from various perspectives, we analyse the advantages and potential impacts of the dynamic nature of CRSs.

Selection Bias for CRS
In the existing literature on conventional recommendation systems, selection bias is a widely examined phenomenon (e.g.Ovaisi et al., 2020;Wang et al., 2021;Liu et al., 2022).It highlights that users' behaviour in observed user-item interactions, as captured in the collected model development datasets, is often skewed, with users primarily interacting with only a small subset of items (Chen et al., 2022).Consequently, the items that are interacted within these datasets do not represent the full spectrum of available items, thereby limiting the ability of the learned model to make promising recommendations and can also exaggerate the impact of other biases, such as the popularity bias.However, the selection bias in CRSs datasets has not been thoroughly examined and discussed.
In this study, we address this gap by examining and statistically revealing the presence of selection bias in conversational recommendation datasets, focusing on their Initial Item Coverage (IIC).That is formulated as follows: where | • | represents the number of unique items in the corresponding set.By calculating the IIC, we can assess the extent to which the training dataset covers the entire item space, revealing the presence and magnitude of selection bias in CRS datasets.

Popularity Bias
Besides selection bias, popularity bias is also a widely studied aspect when evaluating recommendation systems (Chen et al., 2023).Popularity bias arises from the tendency of users to interact more frequently with popular items, resulting in a skewed distribution where popular items receive more attention.This bias can lead to reduced coverage, diversity, and potentially lower user satisfaction levels in personalised recommendation results (Zhao et al., 2022;Abdollahpouri et al., 2017).Over time, it can also contribute to the Matthew effect, further amplifying the popularity gap between items.
A recent formalisation of popularity bias in the context of CRSs considers the recommended item list r d,t at turn t in a dialogue (Lin et al., 2022a).The bias effect is quantified by the multiplication of two components, a ranking utility that assigns weights to items based on their ranking positions and a popularity coverage measure that indicates the proportion of popular items included in the recommendation.The ranking utility π(r d,t ) is calculated as: here, r pop u (η) represents the set of popular items based on a popularity threshold η determined by the items' interaction frequency.The popularity coverage P ru,t is calculated as: where card(•) indicates the cardinality of a set.The above formation, as well as the existing approaches in the literature (Fu et al., 2021;Lin et al., 2022b), overlook the influence of contextual user-system interactions and the varying user intent during a conversation.Considering the multi-turn interaction nature of CRS is crucial for enhancing the user experience.In this study, we extend the formalisation of popularity bias by analysing novel factors, including cross-episode popularity and user intent-oriented popularity effects.

Cross-Episode Popularity (CEP)
Regarding the CEP effect, we assume that a rational CRS model should provide relatively independent suggestions within each individual episode of conversational recommendations.This assumption is supported by an example from an existing CRS benchmark dataset, ReDial, which is provided in the Appendix (see Figure 7).In essence, during multi-turn conversations, users often seek multiple recommendations, and we observe that the preference modelling for accurate recommendations tends to be relatively independent.Therefore, considering the dynamic preferences of users, it becomes crucial to capture their immediate interest within each episode, which may involve both popular and less popular items, to make appropriate suggestions and enhance users' satisfaction levels.To explicitly model the CEP effect, we extend the formulation of popularity bias as follows:

User Intent-Oriented Popularity (UIOP)
In addition to the CEP effect discussed earlier, we emphasize the importance of a CRS model in effectively capturing user intent, whether it leans towards popular items or specific preferences.By identifying and addressing user intent, CRSs can offer tailored suggestions that increase user satisfaction.To quantify the user intent-oriented popularity (UIOP) effect, we leverage the popularity of target items (pop(i d,t )) and formulate it as follows: )P r e d,t | (4) This equation measures the proximity between the popularity of the user's target item and the recommended items.A higher score is obtained only if the recommendations deviate from the target items.
By comprehensively evaluating multiple potential biases and effects in the development of CRSs, we enable a comprehensive assessment of conversational recommendation results.These measures promote the advancement of CRSs that prioritize fairness and unbiased recommendations, as well as boosted user experience.

Bias Mitigation via Generative Recommendation Dialogues
In the preceding discussion, we have identified several biases that can impact the performance of learned CRSs, which could cause an imbalanced training corpus.To mitigate the bias effects, we propose novel data augmentation approaches, joined with the generation of synthetic dialogues.
Synthetic Data Generation.In our generation strategy, inspired by the recent success of incontext learning on LLMs, we develop effective prompts1 as input to the recent advanced GPT3.5 model (cf.Brown et al., 2020) to successfully generate synthetic movie recommendation dialogues.
To control the frequency of item mentions, we use specific movie names as variables and ask the LMs to generate complete conversations about suggesting those movies.The generated dialogues are then reformatted to align with the input instances in the chosen benchmark.
Data Augmentation Strategies.We propose two data augmentation strategies to enhance model performance and mitigate multiple biases when training CRS models using the available synthetic dialogues.Previous studies, as discussed in Section 2.1, have not examined the bias effect in publicly available conversational recommendation datasets.Additionally, these studies faced challenges related to generalizability, either due to the attribute-based definition of popularity (Li et al., 2018) or the specific formulation for the Bayesian Pairwise Ranking loss (Lin et al., 2022a).model update 13: end for In this study, we propose two data augmentation strategies.The first strategy, called 'Once-Aug' (OA), involves adding all synthetic dialogues to the training data, evenly increasing the exposure of items in the corpus.However, Once-Aug does not consider item popularity or allow control of the presence of different items.To address these issues and model a wider range of users' preferences, we introduce the 'PopNudge' strategy.
PopNudge (PN) is inspired by the data-agnostic data augmentation strategy MixUp (Zhang et al., 2018a), which augments training data with 'vicinity' instances to improve the model's ability to handle unseen inputs.Similarly, PopNudge augments training batches with dialogues recommending similar but less popular items, aiming to 'nudge' the model towards bias mitigation and avoid the frequent use of lower-rated items by assuming the potential relationship between popularity and ratings (Zhao et al., 2022).Algorithm 1 outlines the procedure for applying PopNudge, augmenting the model's training batches with dialogues featuring less popular items.For every batch, we initialise an empty augmentation set (line 4).Next, for each dialogue instance, we identify its included item and prepare the set of available items with lower popularity (line 6).Then, we perform weighted sampling to select k items, in addition to the augmentation set, based on item popularity (line 7).
The resulting sampled items are finally augmented in the training corpus for model learning (line 11).
PopNudge offers several advantages despite its simplicity.First, it maintains the long-tail distribution of item mentions, which is consistent with natural human behaviour.This preserves the overall item distribution pattern.Secondly, it effectively increases the exposure of less popular items, leading to a more balanced item distribution.Importantly, PopNudge seamlessly integrates with existing methodologies without requiring any model updates.It selectively incorporates augmented data for model training, making it applicable in various contexts for bias mitigation purposes.

Experimental Setup
Our objective is to improve the evaluation of CRSs techniques based on various bias effects while validating the performance of our proposed data augmentation approaches.To achieve this, we conduct a series of experiments using recent CRS models on two benchmark datasets: ReDial (Li et al., 2018) and TGReDial (Zhou et al., 2020b).These experiments allow us to access the recommendation performance and bias effect on existing approaches and determine the effectiveness of our proposed approach.Table 1 presents   that both ReDial and TGReDial datasets are subject to selection bias, as indicated by their relatively low initial item coverage (IIC) values.For instance, only 72.75% and 34.82% of items are mentioned in the context of training dialogues in the ReDial and TGReDial datasets, respectively.Furthermore, when compared to non-conversational recommendation datasets, the benchmark conversational recommendation data contains significantly fewer interaction data.This finding further emphasizes the importance of data augmentation in addressing these limitations.
In our analysis, we consider several commonly used CRS models that serve as baselines in the literature.We summarise the recommendation component of these models: 1) ReDial (Li et al., 2018), an autoencoder conversational recommender and uses sentiment-analysed user opinions as input.2) KBRD (Chen et al., 2019), it links entities in dialogue content to an external knowledge graph (DB- Pedia) and employs a self-attention mechanism to learn entity representations for recommendations.
3) TG-ReDial (Zhou et al., 2020b) uses the pretrained BERT (Kenton and Toutanova, 2019) and the sequential recommender SASRec (Kang and McAuley, 2018)) to model the historical utterances and user interactions, respectively.4) KGSF (Zhou et al., 2020a) leverages both entity-oriented and word-oriented to enrich data representations and enhance item recommendations.The implementation and evaluation of these models are supported by the open-sourced CRSLab (Zhou et al., 2021) toolkit2 .For recommendation accuracy evaluation, we follow (Zhou et al., 2020b) and apply a set of commonly used evaluation metrics, including Hit Ratio (HIT), NDCG and MRR with cutoffs at 10 and 50.To ensure fair performance comparisons, our data augmentation strategies solely enrich the training data and do not modify the validation and test data.Specifically, for the PopNudge strategy, we explore augmentation effects with varying numbers of sampled dialogues (k) ranging from {1,5,10,50}.

Result Analysis
In this section, we present and discuss the experimental results to illustrate the effectiveness of applying Once-Aug and PopNudge to baseline models while examining the recommendation accuracy, the effect on selection and popularity biases.

Recommendation Accuracy Evaluation
For conversational recommendation models, accurately estimating users' preferences and providing appropriate recommendations are critical.Therefore, as discussed in Section 4, in line with the existing literature, we evaluate the recommendation performance using Hit ratio, NDCG and MRR.Figures 2 and 3 depict the results of Hit@10 and Hit@50 on the ReDial and TGReDial datasets.They show that our data augmentation strategies consistently improve the Hit ratio scores, indicating enhanced recommendation accuracy.However, Once-Aug improves accuracy on the ReDial dataset but does not guarantee improvement on the TGReDial dataset.In contrast, Pop-Nudge consistently enhances the performance of baseline CRS models on both datasets.This conclusion is also supported by the complete experimental results, joined with the evaluation using NDCG and MRR, in the Appendix (Tables 4 and 5).Thus, we conclude that PopNudge is an effective data augmentation strategy that consistently enhances recommendation accuracy with the increased exposure of available items across all four CRS models on both benchmark datasets.

Analysis on Selection Bias
According to the statistical summary of benchmark datasets in Table 1, CRS models are affected by selection bias in the training data.In particular, the initial CRS datasets (ReDial and TGReDial) only cover 72.75% and 34.82% of the available items, which hinders the accurate estimation of users' preferences.Therefore, we aim to comparatively evaluate the effectiveness of our data augmentation strategies in mitigating selection bias.observe that Once-Aug easily achieves 100% coverage of all available items with equal additional exposure, while the PopNudge approach gradually increases item coverage with a growing number of sampled dialogues.Furthermore, Figure 4 demonstrates the frequency of items mentioned in the training dialogues before and after applying the PopNudge approach.It shows that the varying number of sampled dialogues (k) aligns with our objective discussed in Section 3.4, which aims to adjust item exposure without disrupting the longtail distribution of natural user-item interactions.Specifically, increasing the number of sampled dialogues includes items, with a higher frequency of popular items and gradual inclusion of less popular items.Hence, we conclude the effectiveness of both data augmentation strategies in addressing the selection bias, and PopNudge is well-performed in gradually mitigating the selection bias while retaining the long-tail distribution of user-item interactions.

Analysis on Popularity Bias
Next, in this study, we extensively discuss the value of investigating the popularity bias and propose additional Cross-Episode Popularity (CEP) and User Intent-Oriented Popularity (UIOP) to further examine the effectiveness of CRS models towards the corresponding perspectives.At first, in Figure 2 and 3, we depict the scored popularity bias of several CRS models before and after applying our data augmentation strategies on ReDial and TGReDial datasets.The exact scores are also available in the Appendix (see Tables 6 and 7).According to the experimental results, we observe that a lower popularity score does not necessarily leads to a higher recommendation accuracy.For example, for the experimental results on the ReDial dataset, among the four CRS models, KGSF is the best performed with Hit@10 and Hit@50 scores at 0.0414 and 0.1496, respectively.However, it is measured with a higher popularity bias score (i.e., 0.1382) than both KBRD (0.1240) and TGReDial (0.1312).Therefore, a marginal increase in having popular items could potentially benefit the improvement of the recommendation performance of CRS models.This also has been validated in the literature (Zhao et al., 2022).In addition, as for the performance of our data augmentation strategies, we observe two main findings: (1) PopNudge has the potential to improve the model's performance when a model extremely suffers from popularity bias.This can be found in the improved recommendation accuracy but significantly lower popularity bias of the ReDial model on the ReDial dataset.(2) The increasingly sampled dialogues do not significantly increase the impact of popularity biases and can also improve the model with higher recommendation accuracy.In contrast, Once-Aug is rather unstable, such as the examined performance of the ReDial model on the TGReDial dataset.
On the other hand, we also extend the evaluation of popularity bias with two additional measures: CEP and UIOP.In Table 3, we share the full experimental results when evaluated by the CEP and UIOP scores.Accordingly, we observe that by applying our data augmentation strategies, we can lower the CEP scores in most cases apart from KBRD on the ReDial dataset.As discussed in Section 3.3.1, the CEP score can also reflect the diversity of recommendation results across conversational episodes.Hence, the observed lower CEP scores also indicate the value of our data augmentation strategies in diversifying the recommendations, which is promising in improving users' experience.Specifically, according to the experimental results in Figure 2, the initial ReDial model does not perform well on the ReDial dataset with high accuracy, which is significantly improved by our PopNudge approach.By comparing the scores of Hit ratio and CEP, it is likely that the ReDial model suffers from a repetitive recommendation of similar items, which not only lowers the recommendation accuracy but can also harm users' experience.
At last, we examine the UIOP scores, which examine if the predictions are presenting close popularity with the target items.Similar to the findings of CEP scores, we also observe a consistent improvement towards lower UIOP scores after applying our bias mitigation strategies.In addition, we also observe that UIOP has a certain connection with recommendation accuracy.For example, KGSF boosted by PopNudge with 5 sampled dialogues is the best-performed variant as per recommendation accuracy among the basic and dataaugmented models.Meanwhile, we also observe the lowest UIOP score for the identical model variant.Therefore, UIOP has the potential to serve as a metric in evaluating recommendation accuracy and joined with the consideration of item popularity.
In summary, when examined by various popularity-based measures and effects, our introduced data augmentation strategies, especially Pop-Nudge, can effectively mitigate various biases and boost the models' performance in serving diversified and accurate recommendations.In addition, the introduction of the CEP and UIOP effects adds additional insights to the performance evaluation of CRS techniques and can also be leveraged in the future development of CRS models.

Conclusions
Our study investigates several biases within the feedback loop of developing a conversational recommendation system, including selection bias, popularity bias, cross-episode popularity (CEP) effect and user intent-oriented popularity (UIOP) effect.These proposed biases and effects shed light on CRS model performance and provide valuable insights for effective model development.Additionally, we leverage the recent advance of large language models to generate effective synthetic data and propose novel and effective data augmentation strategies, Once-Aug and PopNudge, to enhance model performance.Our experimental results demonstrate the effectiveness of our data augmentation strategies in improving CRS model performance while considering the recommendation accuracy as well as the mitigation across selection bias and popularity biases.

Limitations
In this paper, we make significant contributions by utilising generative dialogues to augment the training data and mitigate biases.However, we acknowledge two main limitations of our approach.Firstly, our generative dialogues only involve a single item per dialogue, which deviates from the majority of dialogues in the initial training corpus and may impact cross-episode recommendations.Secondly, we primarily focus on improving recommendation performance and overlook the evaluation of responses, an essential aspect of conversational techniques.Addressing these limitations will require further dedicated efforts in future work.

C Collection of Full Experimental Results
Figure 1: Data Augmentation Pipeline r e d,t )P r e d,t |ρ(pop(r e d,t ), pop(r e−1 d ))| where |ρ(•, •)| represents the absolute value of the Pearson correlation coefficient between the popularity levels (denoted as pop(•), which is a normalised score based on item frequency) of current recommendations and the ones from the previous episodes.Diverse popularity distributions among recommended items are expected due to the different target items in each episode.A lower coefficient score indicates a weak correlation between previous episodes and current recommendations, resulting in highly diversified recommendations within a conversation interaction.

Algorithm 1
PopNudge Data Augmentation Require: D train , D aug 1: P OP aug ← pop(D aug ) ▷ Item popularity 2: for batches do 3: b train ← batch-sample D train ← D aug − {d j , pop j > pop i }

Figure 2 :Figure 3 :
Figure2: Performance comparison between conversational recommendation models using Hit@10 and Hit@50, together with the scores of popularity bias on the ReDial dataset.OA and PN refer to the Once Aug and PopNudge data augmentation strategies.

Table 1 :
A statistical comparison among conversational and popular non-conversational datasets.The number of dialogues and ratings are counted for conversational and nonconversational datasets, respectively.Popular items are the ratio of items with more than 5 interactions.

Table 2 :
Evaluation of Initial Item Coverage on two datasets and after applying data augmentation.
Table 2 presents the calculated IIC scores after applying various data augmentation strategies.We

Table 3 :
Evaluation of Cross-Episode Popularity (CEP) and User Intent-Oriented Popularity (UIOP) effects on models and after applying data augmentation.

Table 4 :
Initial Experimental results of conversational recommenders on the ReDial dataset.

Table 5 :
Experimental results of conversational recommenders on the TGReDial dataset.