DebiasGAN: Eliminating Position Bias in News Recommendation with Adversarial Learning

Click behaviors are widely used for learning news recommendation models, but they are heavily affected by the biases brought by the news display positions. It is important to remove position biases to train unbiased recommendation model and capture unbiased user interest. In this paper, we propose a news recommendation method named DebiasGAN that can effectively alleviate position biases via adversarial learning. The core idea is modeling the personalized effect of position bias on click behaviors in a candidate-aware way, and learning debiased candidate-aware user embeddings from which the position information cannot be discriminated. More specifically, we use a bias-aware click model to capture the effect of position bias on click behaviors, and use a bias-invariant click model with random candidate positions to estimate the ideally unbiased click scores. We apply adversarial learning to the embeddings learned by the two models to help the bias-invariant click model capture de-biased user interest. Experimental results on two real-world datasets show that DebiasGAN effectively improves news recommendation by eliminating position biases.


Introduction
Accurate news recommendation is critical for improving users' online news reading experience (Wu et al., 2020b).Existing news recommendation methods mainly use users' news click behaviors for interest inference and model training (Wang et al., 2018a;Wu et al., 2019a;Ge et al., 2020;Hu et al., 2020;Wu et al., 2020aWu et al., , 2021a;;Qi et al., 2021b,a).However, news click behaviors are heavily influenced by position biases, i.e., the positions of news when displayed on a webpage (Chen et al., 2020;Bhadani, 2021;Xun et al., 2021).Since top ranked news are more likely to be clicked than those displayed inconspicuously (Baeza-Yates, 2018), models directly learned on click data may be inaccurate in targeting user interest (Yi et al., 2021).
Most existing studies on eliminating position biases in recommendation follow two popular ways (Sato et al., 2020;Liu et al., 2020;Chen et al., 2021;Wu et al., 2021b).The first one is propensity weight, which is a canonical way by weighting the training data based on the influence of position bias (Agarwal et al., 2019), which can be estimated by randomly ranked data (Joachims et al., 2017) or regression-based models (Wang et al., 2018b).The second one is click model, which aims to simulate the generative process of click behaviors (Craswell et al., 2008).It is combined with deep recommendation models by taking positional information as the inputs to disentangle position bias and positionindependent user interest (Guo et al., 2019;Huang et al., 2021).However, existing methods usually model position bias in a non-personalized way without considering user characteristics.In fact, the same position may generate different impacts on different users due to their diverse click preferences.For example, some users prefer to skip the top-displayed content (Benway, 1999).Thus, personalized bias modeling can help better model and eliminate the effects of position bias.
In this paper, we propose a news recommendation method named DebiasGAN1 , which effectively eliminates position biases in news recommendation via adversarial learning.The core idea of our method is two-fold, i.e., using candidate-aware click models to capture the personalized influence of position biases, and using adversarial learning to enforce the candidate-aware user representations learned on real positions to be indistinguishable from those learned on random positions.More specifically, we use a bias-aware click model to capture position biases and a bias-invariant click model with randomized positions of candidate news to estimate the ideally unbiased click scores.Both models consider the interactions between positions

DebiasGAN
The architecture of DebiasGAN is shown in Fig. 1.It contains a bias-aware click model that captures the position bias effect on click behaviors and a bias-invariant click model to model bias-invariant user interest on candidate news.Both models can consider the interactions between user behaviors and candidate news to learn candidate-aware user representations.Their details are introduced below.

Bias-aware Click Model
Denote the N historical clicked news of a user as [D 1 , D 2 , ..., D N ] and the candidate news as D c .We use a news encoder to learn semantic representations of news from their texts.Motivated by (Wu et al., 2019c), we use the Transformer (Vaswani et al., 2017) model as the news encoder.We denote the hidden representation of the clicked news D i and candidate news D c as r i and r c , respectively.To model the impact of position biases on user interest modeling, we incorporate the embedding of the displayed positions of clicked news.We denote the position of the clicked news D i as p i (starts from 1).To reduce the sparsity of positions, we quantize each position p i by pi = ⌈ √ p i − 1⌉.We convert the quantized position pi into its embedding e i , and add it to the semantic news embedding r i to obtain bias-aware news representations.We also add position embedding of the candidate news (denoted as e c ) to r c to obtain a bias-aware candidate news representation d c .The bias-aware representations of clicked news are further process by a behavior Transformer, which captures the relations between click behaviors to help better model interests.We denote its output as Since the same position may generate different impacts on different users, we incorporate user preferences into the modeling of the effects of position bias on click behaviors.More specifically, we use a candidate-aware attention network to select important click behaviors to accurately model user interests on the candidate news displayed at a given position.We denote the bias-aware user interest representation with respect to the candidate news as u, which is formulated as follows: where W c and w c are parameters, a is the candidate-aware attention weights, and ⊙ means element-wise product.We further predict a biasaware click score ŷ based on u, which is formulated as ŷ = w ⊤ u, where w is a parameter vector.This score indicates the predicted probability of a user clicking on a candidate news given the position of candidate news and the user's personal preference on the content and position of this news.

Bias-invariant Click Model
The where W ′ c and w ′ c are parameters, and a ′ is the attention weight vector.Since the candidate news position is randomly generated, the bias-invariant click model is encouraged to model user interests that are independent of displayed positions of news.We predict the bias-invariant click score by ỹ = w ⊤ u ′ , where the parameter w is shared with the bias-aware click model.

Debiasing with Adversarial Learning
Since the position information of candidate news is coupled with fine-grained user interest information encoded by different user behaviors, it is difficult to apply existing coarse-grained debiasing methods such as propensity weight and click models to reduce the bias effects.Thus, to eliminate position bias in the bias-invariant click model to help it estimate unbiased click scores, we propose to apply adversarial learning techniques to the bias-aware and bias-invariant user interest representations.We use a discriminator to classify whether u or u ′ is learned by the bias-aware model.The adversarial label is predicted by the discriminator as follows 3 : where w p and w n are parameters, and σ is the sigmoid function.The adversarial loss function L A we used is written as L A = − log(z).By propagating the negative gradients of the adversarial loss, both bias-aware and bias-invariant click models are encouraged to learn similar representations of user interests on candidate news.Thus, the distributions of their predicted click scores are expected to be similar, and thereby the effects of position biases on click prediction can be effectively mitigated.Some careful readers may find that the model can simply learn the same embedding for all positions to minimize the adversarial loss.In fact, the click prediction task based on the bias-aware click scores enforces the position embeddings to be informative.Thus, this collapsed case can be avoided by assigning proper loss weights to balance the intensity of click prediction loss and adversarial loss.Following (Wu et al., 2019b(Wu et al., , 2020b)), we construct training samples via negative sampling and 3 Similar to the formulation of BPR (Rendle et al., 2012).
we use crossentropy as the loss function for click prediction (Wu et al., 2019c).We denote the click prediction losses of the bias-aware and biasinvariant click models as L B and L D , respectively.The unified loss L is formulated as follows: where α is a hyperparameter that controls the relative intensity of adversarial training for debiasing.
3 Experiments Following many prior works (Wu et al., 2019b,c;Wang et al., 2020) we use Glove (Pennington et al., 2014) embeddings in the news encoder.The model optimizer is Adam (Kingma and Ba, 2015).The coefficient α is 0.5.The hyperparameters are tuned on validation sets.We use AUC, MRR, nDCG@5 and nDCG@10 to evaluate model performance.We repeat each experiment 5 times.The average scores with standard deviations are reported.

Performance Comparison
We compare DebiasGAN with many baseline news recommendation methods, including EBNR (Okura  et al., 2017), DKN (Wang et al., 2018a), NAML (Wu et al., 2019a), NPA (Wu et al., 2019b), LSTUR (An et al., 2019), and FIM (Wang et al., 2020).We also compare several methods for eliminating position biases, including (1) IPW (Joachims et al., 2017;Wu et al., 2021b), using inverse propensity weighting in model learning; (2) Reg-EM (Wang et al., 2018b), a regression based EM method to estimate the propensity weight; (3) PAL (Guo et al., 2019), a position bias-aware learning method for CTR prediction.(4) DPIN (Huang et al., 2021), a deep position interaction network for CTR prediction.For fair comparison, in these methods we use the same bias-aware click model with our approach.The results the two datasets are shown in Tables 2  and 3, respectively.We find that debiased methods outperform those directly learned on biased click data.This is because removing position biases can help learn more accurate click prediction model.In addition, our approach outperforms other debiasing methods, and the improvement on the Uniform dataset is greater.This is because our approach considers the user preference for position biases to better model and eliminate position biases.Moreover, our approach uses adversarial learning to help model unbiased user interests.Thus, our model yields larger performance gains when the test data is unbiased.

Ablation Study
We verify the effectiveness of several core techniques in our approach, including candidate-aware attention networks, adversarial learning, and position quantization.The results of DebiasGAN and its variants without one of these components are shown in Fig. 4. We find that both candidateaware attention and adversarial learning can improve the model performance, especially on the  Uniform dataset.This may be because candidateaware attention can help model the user interest in candidate news, and adversarial learning can help eliminate the effect of position biases on click prediction.In addition, quantizing the position can improve the performance.This may be because the bias effects of adjacent positions are usually similar, and quantizing positions can also reduce their sparsity to learn accurate position embeddings.

Hyperparameter Analysis
We study the impact of the loss coefficient α on the model performance.The performance in terms of AUC w.r.t.different α is shown in Fig. 5.We find that as α increases, the performance first increases and then decreases.This may be because position biases cannot be effectively removed without a sufficient intensity of adversarial gradients, while the click model may not receive adequate supervision from the main recommendation task if α is too large.Thus, we choose α = 0.5 which yields good performance on both datasets.

Conclusion
In this paper, we propose a news recommendation method named DebiasGAN that can eliminate position biases via adversarial learning.We propose to model the interactions between user behavior and position biases to achieve personalized bias model-ing in click prediction.In addition, we propose an adversarial debiasing method to help infer unbiased user interests in candidate news.Experiments on two real-world datasets validate that DebiasGAN can effectively improve news recommendation performance via position debiasing.

Limitation
The major limitation of the DebiasGAN method is the high sensitivity to the selection of the adversarial loss coefficient α, which is mainly due to the intrinsic instability of adversarial learning (Vondrick and Torralba, 2017).Thus, we plan to address this issue in our future work to help DebiasGAN easier to be tuned and deployed.

Figure 3 :
Figure 3: The click-through rate of news displayed at different positions in the NewsApp dataset.

Figure 4 :
Figure 4: Effect of core components in DebiasGAN.

Figure 5 :
Figure 5: Influence of the loss coefficient α.

Table 1 :
Statistics of NewsApp and Uniform datasets.

Table 2 :
Results of different methods on NewsApp.

Table 3 :
Results of different methods on Uniform.