HieRec: Hierarchical User Interest Modeling for Personalized News Recommendation

User interest modeling is critical for personalized news recommendation. Existing news recommendation methods usually learn a single user embedding for each user from their previous behaviors to represent their overall interest. However, user interest is usually diverse and multi-grained, which is difficult to be accurately modeled by a single user embedding. In this paper, we propose a news recommendation method with hierarchical user interest modeling, named HieRec. Instead of a single user embedding, in our method each user is represented in a hierarchical interest tree to better capture their diverse and multi-grained interest in news. We use a three-level hierarchy to represent 1) overall user interest; 2) user interest in coarse-grained topics like sports; and 3) user interest in fine-grained topics like football. Moreover, we propose a hierarchical user interest matching framework to match candidate news with different levels of user interest for more accurate user interest targeting. Extensive experiments on two real-world datasets validate our method can effectively improve the performance of user modeling for personalized news recommendation.


Introduction
Recently, massive people are habituated to reading news articles on online news platforms, such as Google News and Microsoft News Das et al., 2007). To help users efficiently obtain their interested news information, personalized news recommendation technique that aims to recommend news according to user interests, is widely used by these platforms (Wu et al., 2020a;Liu et al., 2010;Lin et al., 2014).
User interest modeling is a critical step for personalized news recommendation (Wu et al., 2021;Zheng et al., 2018;Wu et al., 2020c). Existing methods usually learn a single representation vector to model overall user interests from users' clicked news (Okura et al., 2017;Wu et al., 2020b;. For example, Okura et al. (2017) used a GRU network to model user interests from clicked news. They used the latest hidden state of GRU as the user interest representation. Wu et al. (2019e) used multi-head self-attention network to capture user interests, and used an attentive pooling network to obtain a unified user representation. However, user interest is usually diverse and multigrained. For example, as shown in Fig. 1, a user may have interest in movies, sports, finance and health at the same time. In addition, for users who are interested in sports, some of them may have general interest in this area, while other users like the example user in Fig. 1 may only have interest in a specific sport like football. However, it is difficult for these methods to accurately model the diverse and multi-grained user interest for news recommendation via a single user embedding.
In this paper, we propose a personalized news recommendation approach with hierarchical user interest modeling, named HieRec, which can effectively capture the diverse and multi-grained user interest. Our approach contains three levels of user interest representations to model user interests in different aspects and granularities. The first one is subtopic-level, which contains multiple interest representations to model fine-grained user interests in different news subtopics (e.g., interest in football and golf). They are learned from embeddings of subtopics and the clicked news in the correspond-ing subtopics. The second one is topic-level, which contains multiple interest representations to capture coarse-grained user interests in major news topics (e.g., interest in sports and finance). They are learned from embeddings of news topics and their subordinate subtopic-level interest representations. The third one is user-level, which contains an interest representation to model overall user interests. It is learned from topic-level interest representations. Besides, we propose a hierarchical user interest matching framework to match candidate news with different levels of interest representations to target user interests more accurately. Extensive experiments on two real-world datasets show that HieRec can effectively improve the accuracy of user interest modeling and news recommendation.

Related Work
Personalized news recommendation is an important intelligent application and is widely studied in recent years (Bansal et al., 2015;Wu et al., 2019c;Ge et al., 2020). Existing methods usually model news from its content, model user interest from user's clicked news, and recommend candidate news based on their relevance with user interests (Okura et al., 2017). For example, Okura et al. (2017) utilized an auto-encoder to learn news representations from news bodies. They applied a GRU network to capture user interests from the sequence of users' historical clicks and used the last hidden state vector of GRU as user interest representation. Besides, they proposed to model relevance between user interest and candidate news based on the dot product of their representations. Wu et al. (2019a) learned news representations from news titles, bodies, categories, and subcategories based on an attentive multi-view learning framework. They build user interest representation based on the attentive aggregation of clicked news representations.  used a CNN network to learn news representations from news titles and categories. They applied a GRU network to user's clicked news to build a shortterm user interest representation and applied user ID embedding to learn long-term user interest representation. They further learned a unified user interest representation based on the aggregation of short-and long-term user interest representation. Liu et al. (2020) proposed to learn news representations from news titles and entities via a knowledge graph attention network. They also obtained user interest representation from representations of clicked news via an attention network. Besides, all of these three methods adopted the inner product for matching candidate news. Most existing methods learn a single user embedding to represent the overall user interests (Wang et al., 2018;Wu et al., 2019e,b). However, user interests are usually very diverse and multi-grained, which are difficult to be accurately modeled by a single user embedding. Different from these methods, we propose a hierarchical user interest modeling framework to model user interests in different aspects and granularities. In addition, we propose a hierarchical user interest matching framework to understand user interest in candidate news from different interest granularities for more accurate user interest targeting.

HieRec
In this section, we first give a problem formulation of personalized news recommendation. Then we introduce our HieRec method in detail.

Problem Formulation
Given a candidate news n c and a target user u, the goal is calculating an interest score o to measure the interest of this user in the candidate news. Each news n has a title, a topic t and a subtopic s. The title is composed of a text sequence T = [w 1 , w2, ..., w T ] and an entity sequence E = [e 1 , e 2 , ..., e E ], where w i and e i respectively denote the i-th word and entity in news title, T and E respectively denote the number of words and entities. We assume the user has M clicked news. In HieRec, we further divide these clicks based on their topics and subtopics for hierarchical user interest modeling. More specifically, we build a clicked topic set {t i |i = 1, ..., m} from topics of user's clicks, where t i is the i-th clicked topic and m is the number of clicked topics. We can further obtain a clicked subtopic set {s i j |j = 1, ..., d} subordinate to each clicked topic t i , where s i j is the j-th clicked subtopic subordinate to topic t i and d is the size of the set. Finally, user's clicked news in topic t i and subtopic s i j are divided into the same click group N i j = {n i,j k |k = 1, ..., l}, where n i,j k denotes the k-th clicked news in this group and l is the number of clicked news in the group.

Hierarchical User Interest Modeling
In general, user interest is usually very diverse and multi-grained. For example, according to the example user has interests in many different aspects at the same time, such as sports, movies, and finance. Besides, for users who are interested in sports, some of them may have general interests in this area and may read news on different kinds of sports, such as basketball, football, golf, and so on. While other users (like the example user in Fig. 1) may only have interest in a specific sport like football. Understanding user interest in different aspects and granularities has the potential to model user interests more accurately. Thus, we propose a hierarchical user interest modeling framework, which learns a hierarchical interest tree to capture diverse and multi-grained user interest. As shown in Fig. 2, HieRec represents user interests via a three-level hierarchy. First, we learn multiple subtopic-level interest representations to model fine-grained user interests in different news subtopics (e.g. football and golf). The subtopic-level interest representation for subtopic s i j is learned from N i j that is composed of user's clicked news in subtopic s i j . Since clicked news may have different informativeness for modeling user interest, we adopt a subtopic-level attention network to select informative clicked news for modeling user interest in subtopic s i j : where γ k denotes the attention weight of the k-th clicked news n i,j k in N i j , n i,j k is the representation of news n i,j k (Section. 3.4 introduces how to obtain it) and φ s (·) denotes a dense network. Besides, we also adopt a subtopic embedding layer to capture semantic information of different subtopics, from which we can obtain the embedding vector s i j of subtopic s i j . Finally, we learn the subtopic-level user interest representation u s i,j based on the combination of c i j and s i j , i.e., u s i,j = c i j + s i j . Similarly, we also learn subtopic-level interest representations for other subtopics clicked by the user.
Second, we learn multiple topic-level interest representations to model coarse-grained user interests in major news topics (e.g. sports and finance). The topic-level interest representation for a clicked topic t i is learned from subtopic-level interest representations {u s i,j |j = 1, ..., d} of subtopics {s i j |j = 1, ..., d} subordinate to the topic t i . More specifically, user interests in different subtopics may have different importance for modeling user interest in a specific topic. Besides, the number of clicked news on a subtopic may also reflect its importance for modeling topic-level user interest. Thus, we utilize a topic-level attention network to select important subtopic-level user interest representations to model user interest in topic t i : where v s i,j = [u s i,j ; r i j ], r i j is the embedding vector for the number of clicked news on subtopic s i j , [·; ·] is the concatenation operation, β j is the attention weight of u s i,j , and φ t (·) is a dense network. Besides, we also use a topic embedding layer to model semantic information of different topics and drive the embedding vector t i for topic t i . Finally, we aggregate z i and t i to learn the topic-level user interest representation u t i in topic t i : u t i = z i + t i . Similarly, we also learn topic-level interest representations for other clicked topics. Third, we learn a user-level interest representation u g to model overall user interests. It is learned from topic-level interest representations. Similarly, we adopt a user-level attention network to model relative importance of topic-level user interests to learn user-level interest representation: where v t i = [u t i ; r i ], r i is the embedding vector for the number of user's clicked news on topic t i , α i denotes the attention weight of the i-th topic-level interest representation, and φ g (·) denotes a dense network for calculating attention scores.

Hierarchical User Interest Matching
Matching between candidate news and user interests at different granularities can provide various clues for user interest targeting. For example, according to Fig. 1, although all of the 3rd, 4th, and 5th news are about sports, the user only clicks the 3rd news probably because of her fine-grained interests in football rather than basketball and golf. This implies that the matching between candidate news and fine-grained user interests is useful for personalized news recommendation. Besides, not all candidate news can match with fine-grained user interests. For instance, a news on subtopic baseball cannot match any fine-grained interests of the example user in Fig. 1. Fortunately, the coarsegrained user interests (i.e., interest in sports) and overall user interests can match with this candidate news. This implies that matching candidate news with coarse-grained user interests and overall user interests is also important. Thus, we propose a hierarchical user interest matching framework, which models user interests in candidate news from different interest granularities. As shown in Fig. 3, it takes candidate news (including its representation n c , topic t c and subtopic s c ) and hierarchical user interest representation as input. First, we match candidate news with overall user interests and calculate a user-level interest score o g based on the relevance between n c and u g : o g = n c · u g .
Second, topic-level interest representation u t tc models coarse-grained user interests in the topic t c of candidate news. It can provide coarse-grained information to understand user interest in candidate news. Thus, we match topic-level interest representation u t tc with candidate news n c as:ô t = n c · u t tc . Besides, we can infer users may be more interested in topics that they have clicked more. Thus, we weightsô t based on the ratio w tc of topic t c in historical clicked news and obtained topic-level interest score o t : o t =ô t * w tc . Besides, if the candidate news does not belong to any user's clicked topics, we set o t as zero directly.
Third, subtopic-level interest representation u s sc models fine-grained user interest in the subtopic s c of candidate news and can be used to capture fine-grained user interests in candidate news. Thus, we match subtopic-level interest representation u s sc and candidate news n c as:ô s = n c · u s sc Similarly, we weightsô s based on the ratio w sc of subtopic s c in user's clicked news and obtain the subtopic-level interest score: o s =ô s * w sc .
Finally, interest scores of three different levels are aggregated to an overall interest score o: where λ t , λ s , ∈ R + are hyper-parameters for controlling the relative importance of interest scores of different levels. Besides, we have λ t + λ s < 1.

News Representation
We introduce how to obtain news representation from texts and entities of news titles. As shown in Figure 4: News representation learning framework. Fig. 4, we first use a text encoder to model news texts. It first applies a word embedding layer to enrich semantic information of the model. Next, it adopts a text self-attention network (Vaswani et al., 2017) to learn word representations from contexts of news texts. Then, it uses a text attention network to learn text representation n t by aggregating word representations. Besides texts, knowledge graphs can also provide rich information for understanding news content via entities in news (Wang et al., 2018). Thus, we apply an entity encoder to learn entity representation of news. We first use an entity embedding layer to incorporate information from knowledge graphs into our model. We further apply an entity self-attention network to capture relatedness among entities. Next, we utilize an entity attention network to learn entity representation n e of news by aggregating entities. Finally, we build representation n of news as: n = W t n t + W e n e , where W t and W e are parameters.

Model Training
Following (Wu et al., 2019d), we utilize the NCE loss for model optimization. Given a positive sample n + i (a clicked news) in the training dataset O, we randomly select K negative samples [n 1 i , ..., n K i ] (non-clicked news) for it from the same news impression displayed to the user u. The NCE loss L requires the positive sample should be assigned a higher interest score o + i than other negative samples [o 1 i , ..., o K i ] and is formulated as: 4 Experiment

Experimental Datasets and Settings
We conduct extensive experiments on two realworld datasets to evaluate the effectiveness of Hi-  eRec. The first one is the public MIND dataset (Wu et al., 2020d) 1 . It is constructed by user behavior data collected from Microsoft News from October 12 to November 22, 2019 (six weeks), where user data in the first four weeks was used to construct users' reading history, user data in the penultimate week was used for model training and user data in the last week was used for evaluation. Besides, MIND contains off-the-shelf topic and subtopic label for each news. The second one (named Feeds) is constructed by user behavior data sampled from a commercial news feeds app in Microsoft from January 23 to April 01, 2020 (13 weeks). We randomly sample 100,000 and 10,000 impressions from the first ten weeks to construct training and validation set, and 100,000 impressions from the last three weeks to construct test data. Since Feeds only contains topic label of news, we implement a simplified version of HieRec with only user-and topiclevel interest representations on Feeds. Besides, following Wu et al. (2020d), users in Feeds were anonymized via hash algorithms and de-linked from the production system to protect user privacy. Detailed information is summarized in Table 1. Next, we introduce experimental settings and hyper-parameters of HieRec. We use the first 30 words and 5 entities of news titles and users' recent 50 clicked news in experiments. We adopt pre-trained glove (Pennington et al., 2014) word embeddings and TransE entity embeddings (Bordes et al., 2013) for initialization. In HieRec, the word and entity self-attention network output 400and 100-dimensional vectors, respectively. Besides, the unified news representation is 400-dimensional. Attention networks (i.e., φ s (·), φ t (·), and φ g (·)) are implemented by single-layer dense networks. Besides, dimensions of topic and subtopic embeddings are 400, both of which are randomly initialized and fine-tuned. The hyper-parameters for combining different interest scores, i.e. λ t and λ s , are set to 0.15 and 0.7 respectively. Moreover, we utilize dropout technique (Srivastava et al., 2014) and Adam optimizer (Kingma and Ba, 2015) for training. HieRec is trained for 5 epochs with 0.0001

Main Results
We first introduce the baseline methods we compared in experiments: (1) EBNR (Okura et al., 2017): learning user representations from the sequence user's clicked news via a GRU network.  Table 2, from which we have several observations. First, HieRec significantly outperforms other baseline methods which learn a single user embedding to model overall user interests, such as NRMS, NPA, and NAML. This is because user interests are usually diverse and multi-grained. However, it is difficult for a single representation vector to model user interests in different aspects and granularities, which may be suboptimal for personalized news recommendation. Different from these methods, we propose a hierarchical user interest modeling framework, which can represent diverse and multigrained user interests via a three-level hierarchy. Besides, we also propose a hierarchical user interest matching framework to match user interest with candidate news from different granularities, which can better target user interests. Second, HieRec can significantly outperform FIM, which directly model user interests in candidate news from the semantic relevance of candidate news and user's clicked news. This may be because FIM did not consider user interests from different granularities for matching candidate news.

Effectiveness in User Modeling
To fairly compare different methods with HieRec on the performance of interest modeling, we compare them based on the same news modeling method (the news modeling method introduced in Section 3.4). Experimental results are summarized in Table 3 and we only show experimental results on MIND in the following sections. Table 3 shows that HieRec significantly outperforms existing interest modeling methods. This is because user in-   terests are usually diverse and multi-grained. It is difficult for existing methods with single user embedding to capture user interests in different aspects and granularities. Different from these methods, HieRec learns a three-level hierarchy to represent diverse and multi-grained user interests.

Ablation Study
We evaluate the effectiveness of user interest representations of different levels by removing the corresponding interest matching scores from Eq. 4. Results are shown in Fig. 5 and we have several findings. First, HieRec with user-and topic-or subtopic-level interest representation significantly outperforms HieRec with only user-level interest representation. This is because matching candidate news with fine-grained user interests has the potential to improve the accuracy of news recommendation. Topic-and subtopic-level interest representation can model finer-grained user interests than the user-level interest representation. Thus, they can provide additional information to match candidate news than user-level interest representation. Second, HieRec with interest representations of three levels also outperforms HieRec with user-and topic-or subtopic-level interest representation. This may be because matching candidate news with user interests of different granularities can help perform more accurate interest matching.
Since topic-and subtopic-level interest representa-  tion capture user interests at different granularities, incorporating both of them can further improve the recommendation performance.

Performance on Recall and Diversity
Next, we compare different user interest modeling methods on the news recall task. 3 Since methods that model user interests with candidate news information, e.g., DKN and GNewsRec, cannot be applied in the news recall task due to efficiency issues (Pal et al., 2020), we do not compare them in experiments. We evaluate the accuracy and diversity of top K recalled candidate news. Following existing works (Pal et al., 2020;Chen et al., 2018), the former is measured by recall rates, and the latter is measured by intra-list average distance (ILAD). For HieRec, we employ subtopic-level interest representations to perform multi-channel news recall and equally integrate news recalled by different interest channels. Experimental results are summarized in Fig. 6 and Fig. 7, which show that HieRec significantly outperforms other methods in terms of both recall rates and diversity. This is because user interests are usually very diverse and multi-  grained, which are difficult to be comprehensively modeled by a single representation vector. Different from these methods, HieRec hierarchically represents user interests and can better model user interests in different aspects and granularities. Besides, this also implies that compared to existing personalized methods, HieRec can help users explore more diverse information and alleviate filter bubble issues (Nguyen et al., 2014) to some extent.

Hyper-parameters Analysis
As shown in Fig. 9, we analyze the influence of two important hyper-parameters of HieRec (i.e., λ t , λ s ) used for combining different levels of interest scores. First, when λ t is fixed, performance of HieRec first gets better with the increase of λ s . This is because λ s controls the importance of o s . Bedsides, o s measures the relevance of candidate news and fine-grained user interests, which can provide accurate information to understand user interests in the candidate news. When λ s is too small, HieRec cannot effectively exploit information in o s . Second, large value of λ s also hurts the performance of HieRec. This is because when λ s is too large, HieRec cannot effectively exploit userand topic-level matching scores to recommend can-didate news. However, matching candidate news with both overall and coarse-grained user interests is important for personalized news recommendation. Thus, a moderate λ s , i.e., 0.65 or 0.7, is suitable for HieRec. Third, when λ s is fixed, the performance of HieRec also first gets better with the increase of λ t and gets worse when λ t is too large. This is because HieRec cannot effectively utilize information of o t when λ t is too small. Besides, HieRec cannot effectively utilize information of o g and o s when λ t is too large. Thus, a moderate λ t , i.e., 0.12 or 0.15, is suitable for HieRec.

Case Study
We conduct a case study to show the superior performance of HieRec. We compare HieRec with GNewsRec since GNewsRec achieves best AUC score in Table 2 among baseline methods. In Fig. 8, we show the top 5 news recommended by HieRec and GNewsRec in a randomly sampled impression. Besides, we also show the historical clicks of the target user in this impression. We can find that the top 5 news recommended by GNewsRec is dominated by news on politics, which cannot comprehensively cover different user interests. This is because user interests are usually diverse and multigrained. However, it is difficult for GNewsRec, which learns a single representation to model overall user interests, to effectively capture user interests in different aspects and granularities. Different from GNewsRec, the top 5 news recommended by HieRec are diverse and can cover topics that the user may be interested in. Besides, the user clicked a news recommended by HieRec. This is because HieRec learns a hierarchical user interest representation which can effectively model user interests in different aspects and granularities. With the help of the hierarchical user interest representation, Hi-eRec can match candidate news with user interests in different aspects and granularities.

Conclusion
In this paper, we propose a personalized news recommendation method named HieRec for hierarchical user interest modeling, which can effectively model diverse and multi-grained user interests. Hi-eRec learns a three-level hierarchy to represent user interest in different aspects and granularity. First, we learn multiple subtopic-level interest representations to model fine-grained user interests in different news subtopics. Second, we learn multiple topic-level interest representations to model coarse-grained user interests in several major news topics. Third, we learn a user-level interest representation to model overall user interests. Besides, we propose a hierarchical user interest matching framework to match candidate news with user interest from different granularity for more accurate user interest targeting. Extensive experiments on two real-world datasets show the effectiveness of HieRec in user interest modeling.

Ethics and Impact Statement
In this paper, we present HieRec to model diverse and multi-grained user interest. HieRec can be applied to online news platforms for personalized news recommendation, which can help platforms improve user experience and help users find interested news information. Although HieRec can bring many benefits, it may also have several potential risks, which we will discuss in detail. Accuracy Although HieRec outperforms baseline methods in term of recommendation accuracy (Table 2), it may also have some inaccurate recommendation results that users are not interested in. Users usually just ignore them and will not click them to read. The user experience may be harmed and users may use the online news service less in the future, or turn to other online news platforms.
Privacy In HieRec, we rely on user behavior data centrally stored on the news platform for model training and online services. User behavior data is usually privacy-sensitive, and its centralized storage may lead to privacy concerns and risks. In the future, we will explore to train and deploy Hi-eRec in a more privacy-preserving way based on some effective privacy protection techniques like Federated Learning .
Diversity Filter bubbles and echo chambers are the common problem for many recommender systems (Nguyen et al., 2014), which harms user experience. Improving recommendation diversity has the potential to alleviate the problem of filter bubbles and echo chambers. Through experiments in Fig. 7, we find that HieRec can outperform many news recommendation methods in term of recommendation diversity. Thus, compared with existing methods, HieRec has the potential to alleviate filter bubble problem to some extent. Besides, in order to further improve recommendation diversity, HieRec can be combined with some existing methods in this field like DPP (Chen et al., 2018).
Fake News and Clickbait There may be some fake news and clickbait in some online platforms. In order to handle the negative social impact and the user experience harm brought by these fake news and clickbait, online news platforms can use some existing fake news detection and clickbait detection techniques such as Shu et al., 2019) to filter these kinds of news before applying HieRec for personalized recommendation.
Fairness Like many other recommender systems, HieRec relies on user behavior data for model training and online service. The bias in user behavior data may lead to some specific groups of users not be able to receive news information with sufficient accuracy and diversity, and the recommendation results may be more suitable for some major populations. Recently, some fairness-aware recommendation methods like FairRec (Wu et al., 2021) have been proposed to eliminate bias and unfairness in recommender systems. We can combine HieRec with these methods to improve the fairness of the recommendation results and mitigate the harms for marginalized populations.
Misuse The proposed HieRec method works in a data-driven way. It trains the model from the user logs and makes personalized recommendations to users based on their interest inferred from their clicked news. However, in some extreme cases, the recommendation results may be maliciously manipulated to influence users. To avoid the potential misuse, the usage of HieRec should comply with the regulations and laws, and intentional manipulation should be prohibited.