Two Birds with One Stone: Unified Model Learning for Both Recall and Ranking in News Recommendation

Recall and ranking are two critical steps in personalized news recommendation. Most existing news recommender systems conduct personalized news recall and ranking separately with different models. However, maintaining multiple models leads to high computational cost and poses great challenges to meeting the online latency requirement of news recommender systems. In order to handle this problem, in this paper we propose UniRec, a unified method for recall and ranking in news recommendation. In our method, we first infer user embedding for ranking from the historical news click behaviors of a user using a user encoder model. Then we derive the user embedding for recall from the obtained user embedding for ranking by using it as the attention query to select a set of basis user embeddings which encode different general user interests and synthesize them into a user embedding for recall. The extensive experiments on benchmark dataset demonstrate that our method can improve both efficiency and effectiveness for recall and ranking in news recommendation.


INTRODUCTION
Personalized news recommendation techniques are widely used by many online news websites and Apps to provide personalized news services [1,14,23]. Recall and ranking are two critical steps in personalized news recommender systems [9]. As shown in Fig. 1, when a user visits a news platform, the recommender system first recalls a set of candidate news from a large-scale news pool, and then ranks candidate news for personalized news display [23]. Both news recall and ranking have been widely studied [1, 6, 10-12, 14, 18-21]. In online news recommender systems, recall and ranking Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). are usually conducted separately with different models, as shown in Fig. 1. However, maintaining separate models for news recall and ranking in large-scale news recommender systems usually leads to heavy computation and memory cost, and it may be difficult to meet the latency requirement of online news services. Learning a unified model for personalized news recall and ranking would be greatly beneficial for alleviating the computation load of news recommender systems. However, it is a non-trivial task because the goals of recall and ranking are not the same [4,13,25,26]. Ranking usually aims to accurately rank candidates based on their relevance to user interests [21], while recall mainly aims to form a candidate pool that can comprehensively cover user interests [11]. Thus, the model needs to adapt to the different goals of recall and ranking without hurting their performance.
In this paper, we propose a news recommendation method named UniRec, which can learn a unified user model for personalized news recall and ranking. In our method, we first encode news into embeddings with a news encoder, and learn a user embedding for ranking from the embeddings of historical clicked news. We further derive the user embedding for recall by using the user embedding for ranking as the attention query to select a set of basis user embeddings that encode different general user interest aspects and synthesize them into a user embedding for recall. In the test phase, we only use the basis user embeddings with top attention weights to compose the user embedding for recall to filter noisy user interests. Extensive experiments on a real-world dataset demonstrate that our method can conduct personalized news recall and ranking with a unified model and meanwhile achieve promising recall and ranking performance.

METHODOLOGY
In this section, we introduce our UniRec approach in detail. Its overall framework is shown in Fig. 2. We first learn a user embedding for ranking from the user's historical clicked news. We then derive a  Figure 2: The framework of UniRec.
user embedding for recall from the user embedding for ranking and a set of basis user embeddings that encode different general user interests. The ranking and recall details of UniRec are introduced as follows.

Ranking for News Recommendation
The ranking part aims to rank candidate news in a small candidate list according to user interests. Following [23], UniRec uses a news encoder that learns news embeddings from news texts and a user encoder that learns user interest embedding for ranking from the embeddings of clicked news. The candidate news embedding and user embedding for ranking are used to compute a click score for personalized news ranking. More specifically, we denote a user has historical clicked news ]. These clicked news are encoded into a sequence of news embeddings, which is denoted as [r 1 , r 2 , ..., r ]. The user encoder further takes this sequence as input, and outputs a user embedding u for ranking. For a candidate news , we use the news encoder to obtain its embedding r . We follow [14] to compute the probability score of the user clicking on the candidate news via inner product, i.e.,ˆ= u · r . The click scores of the news in a candidate list are used for personalized ranking. Following [22], we use multi-head self-attention networks to implement both news and user encoders, where the contexts of words and clicked behaviors can be modeled for learning accurate news and user embeddings, respectively. In addition, following [5] we add position embeddings to capture the orders of words and behaviors.

Recall for News Recommendation
The recall part aims to recall candidate news from a large news pool based on their relevance to user interests. To exploit the interest information of users for personalized news recall in an efficient way, we take the user embedding for ranking as input instead of rebuilding user interest representations from original user click behaviors. However, since the goals of ranking and recall are not the same [8], the user embedding for ranking may not be suitable for news recall. Thus, we propose a method to distill a user embedding for recall from the user embedding for ranking. More specifically, we maintain a basis user embedding memory that encodes different general interest aspects of users. We denote the basis user embeddings in the memory as is the number of them. We use the user embedding for ranking as the attention query to select basis user embeddings. We denote the attention weight of the -th basis user embedding as , which is computed as follows: where the parameter vectors w are served as the attention keys. Different from many attention networks whose attention keys and values are equivalent, in our approach the keys (i.e., parameters w ) are different from the values (i.e., basis user embeddings v ). This is because we expect the basis user embeddings to have different spaces with the user embeddings for ranking to better adapt to the characteristics of the recall task. The set of basis user embeddings are further synthesized into a unified user embedding u for recall via a summation of basis user embeddings weighted by their attention weights, i.e., u = =1 v . We use a news encoder that is shared with the ranking part to obtain the embedding r of each candidate news in the news pool. The final recall relevance scoreˆbetween user interest and candidate news is computed byˆ= u · r .

Model Training
In this section, we introduce the model training details of UniRec. We use a two-stage model training strategy to first learn the ranking part and then learn the recall part. Following prior works [7,21,22], we use negative sampling techniques to construct samples for contrastive model learning [15]. For learning the ranking part, we use clicked news in each impression as positive samples, and we randomly sample non-clicked news that are displayed in the same impression as negative samples. The loss function for ranking part training is formulated as follows: whereˆ+ andˆ− denote the predicted click scores of a positive sample and the corresponding -th negative sample, respectively. By optimizing this loss function, the parameters of news and user encoders can be tuned. Motivated by [24], we fix the news encoder after the model converges. Then, to learn the recall part, we also use clicked news of each user as positive samples, while we randomly select non-clicked news from the entire news set as negative samples, which aims to simulate the news recall scenario. The loss function for recall part training is as follows: whereˆ+ andˆ− represent the predicted recall relevance scores of a positive sample and the corresponding -th negative sample, respectively. However, not all basis user embeddings are relevant to the interests of a user. Thus, motivated by the idea of Principal Component Analysis (PCA), in the test phase we propose to only use the top basis user embeddings with the highest attention weights to compose the user embedding for recall. We denote these basis user embeddings as [v 1 , v 2 , ..., v ]. We re-normalize their attention weights as follows: The user embedding u for recall is built by u = =1 v . In this way, the user embedding for recall can attend more to the major interests of a user and filter noisy basis user embeddings for better news recall.

Complexity Analysis
In this section, we provide some discussions on the computational complexity. In existing news recommendation methods that conduct recall and ranking with separate models, the computational complexity of learning user embeddings for recall and ranking are both ( ) at least, which depends on the architecture of user encoder. 1 UniRec has the same complexity in learning the user embedding for ranking, but the complexity of deriving the user embedding for recall is reduced to ( ), where is usually much smaller than . In addition, the attention network used for synthesizing the user embedding for recall may also be lighter-weight than the user encoder. Thus, the total computational complexity of recall and ranking can be effectively reduced.

EXPERIMENTS 3.1 Dataset and Experimental Settings
We conduct experiments on a large-scale public dataset named MIND [23] for news recommendation. It contains news impression logs of 1 million users on Microsoft News in 6 weeks. The logs in the first five weeks are for training and validation, and the rest logs are for test. The detailed statistics of MIND are shown in Table 1.
In our experiments, following [23] we use news titles to learn news embeddings. The number of attention heads in news and user encoder is 16, and the hidden dimension of each head is 16. The 1 In NRMS [22] the complexity is ( 2 ). number of basis user embedding is 20. The hyperparameter that controls the number of basis user embeddings for composing the user embedding for recall in the test phase is 5. The number of negative samples associated with each positive one is 4 and 200 for the ranking and recall tasks, respectively. Adam [2] is used as the optimizer, and the learning rate is 1e-4. The dropout [17] intensity is 20%. The batch size is 32. These hyperparamters are selected on the validation set. Following [23], we use AUC, MRR, nDCG@5 and nDCG@10 to evaluate news ranking performance. In addition, we use recall rate of the top 100, 200, 500 and 1000 ranked news to evaluate news recall performance. We repeat every experiment 5 times and record the average results.

Performance Evaluation
We first compare the ranking performance of UniRec with several baseline methods, including: (1) EBNR [14], GRU [3] network for user interest modeling in news recommendation; (2) DKN [19], deep knowledge network for news recommendation; (3) NPA [21], news recommendation with personalized attention; (4) NAML [20], news recommendation with attentive multi-view learning; (5) NRMS [22], news recommendation with multi-head self-attention. The ranking performance of different methods is shown in Table 2. From the results, we find that UniRec outperform several compared baseline methods like NAML and NPA. This may be because self-attention has stronger ability in modeling news and user interests. In addition, UniRec also slightly outperforms its basic model NRMS. This is because UniRec can capture the orders of words and behaviors via position embedding.    In the news recall task, we compare the performance of UniRec with the following baseline methods: (1) YoutubeNet [4], using the average of clicked news embeddings for recall; (2) Pinnersage [16], an item recall method based on hierarchical clustering; (3) Octopus [11], learning elastic number of user embeddings for item recall; (4) UniRec(all), a variant of UniRec that uses all basis user embeddings to compose the user embedding for recall. We summarized the recall performance of different methods in Table 3. Referring to this table, we have several findings. First, compared with YoutubeNet, other recall methods such as Pinnersage and UniRec usually perform better. This may be because different user behaviors may have different importance in user interest modeling and simply average their embeddings may be suboptimal. Second, both UniRec and UniRec(all) outperform other baseline methods. This is because our approach can exploit the user interest information inferred from the ranking module to enhance news recall. In addition, our approach is a unified model for both recall and ranking, which has better efficiency in online systems than other methods. Third, UniRec outperforms its variant UniRec(all). It may be because selecting the basis user embeddings with top attention weights can help learn more accurate user interest embeddings by attending to major user interests and filtering noisy ones. The above results validate the effectiveness of UniRec in both news ranking and recall.

Hyperparameter Analysis
In this section, we study the influence of two important hyperparameters in our UniRec method, including the total number of basis user embeddings and the number of basis user embeddings  for composing the user embeddings for recall. We first set = and tune the value of . The recall performance is shown in Fig. 3. We find the performance is suboptimal when is too small, which may be due to the diverse user interests cannot be covered by a few basis user embeddings. However, the performance also descends when is large. This may be because it is difficult to accurately select informative basis user embeddings for user interest modeling. In addition, the computation and memory costs also increase. Thus, we set to a medium value (i.e., 20) that yields the best performance. We then tune the value of under = 20. The results are shown in Fig. 4. We find the performance is suboptimal when is very small. This is intuitive because the user interests cannot be fully covered. However, the performance also declines when is relatively large. This may be because basis user embeddings with relatively low attention weights are redundant or even noisy for user interest modeling. Thus, we choose to use 5 basis user embeddings to compose the user embedding for recall.

Case Study
We also validate the effectiveness of UniRec in news recall via several case studies. Fig. 5 shows the clicked news of a random user and several top news recalled by UniRec. From the user's clicked news, we can infer that this user may be interested in finance, sports and TV shows. We find the recall results of UniRec effectively cover all interests of this user. These results shows that UniRec can effectively model user interests for personalized news recall.

CONCLUSION
In this paper, we present a unified approach for recall and ranking in news recommendation. In our method, we first infer a user embedding for ranking from historical news click behaviors via a user encoder model. Then we derive a user embedding for recall from the obtained user embedding for ranking by regarding it as attention query to select a set of basis user embeddings that encode different general user interests. Extensive experiments on a benchmark dataset validate the effectiveness of our approach in both news ranking and recall.