DIGAT: Modeling News Recommendation with Dual-Graph Interaction

News recommendation (NR) is essential for online news services. Existing NR methods typically adopt a news-user representation learning framework, facing two potential limitations. First, in news encoder, single candidate news encoding suffers from an insufficient semantic information problem. Second, existing graph-based NR methods are promising but lack effective news-user feature interaction, rendering the graph-based recommendation suboptimal. To overcome these limitations, we propose dual-interactive graph attention networks (DIGAT) consisting of news- and user-graph channels. In the news-graph channel, we enrich the semantics of single candidate news by incorporating the semantically relevant news information with a semantic-augmented graph (SAG). In the user-graph channel, multi-level user interests are represented with a news-topic graph. Most notably, we design a dual-graph interaction process to perform effective feature interaction between the news and user graphs, which facilitates accurate news-user representation matching. Experiment results on the benchmark dataset MIND show that DIGAT outperforms existing news recommendation methods. Further ablation studies and analyses validate the effectiveness of (1) semantic-augmented news graph modeling and (2) dual-graph interaction.


Introduction
News recommendation is an important technique to provide people with the news which satisfies their personalized reading interests (Okura et al., 2017;Wu et al., 2020).Effective news recommender systems require both accurate textual modeling on news content (Wang et al., 2018;Wu et al., 2019d;Wang et al., 2020) and personal-interest modeling on user behavior (Hu et al., 2020b;Qi et al., 2021c).Hence, most news recommendation methods (An et al., 2019;Wu et al., 2019a,b,c,d;Ge et al., 2020; 1 Our code is available at https://github.com/Veasonsilverbullet/DIGAT.Jian Li is the corresponding author.Though promising, there are still two potential limitations in the existing news recommendation framework.First, in news encoder, single candidate news encoding suffers from an insufficient semantic information problem.Unlike long-term items in common recommendation (e.g., E-commerce product recommendation), the candidate news items are short-term and suffer from the cold start problem.In the real-world setting, news recommender systems usually handle the latest news, where existing user-click interactions are always not available2 .Hence, it is intractable to use existing user-click records to enrich the information of candidate news.On the other hand, compared to abundant historical clicked news in user encoder, the single candidate news may not contain sufficient semantic information for accurate news-user representation matching in the click prediction stage.Prior studies (Wu et al., 2019a,c;Qi et al., 2021c) pointed out that users were usually interested in specific news topics (e.g., Sports).Empirically, the text of single candidate news does not contain enough syntactic and semantic information to accurately represent a genre of news topic and match user interests.
Second, previous studies generally follow two research directions to model user history, i.e., sequence and graph modeling.Formulating user history as a sequence of user's clicked news is a more prevalent direction, based on which time-sequential models (Okura et al., 2017;An et al., 2019;Qi et al., 2021b) and attentive models (Zhu et al., 2019;Wu et al., 2019a,b,d;Qi et al., 2021a,c) are proposed.Besides, graph modeling is proved effective for recommender systems (Chen et al., 2020).Ge et al. (2020) and Hu et al. (2020b) formulate news and users jointly in a bipartite graph to model newsuser interaction.However, since most candidate news in test data has no existing interaction with users (i.e., cold-news), the isolated cold-news nodes cause this bipartite graph modeling degenerate.Recent works formulate user history as heterogeneous graphs and employ advanced graph learning methods to extract the user-graph representations (Hu et al., 2020a;Mao et al., 2021;Wu et al., 2021).These works focus on how to extract fine-grained representations from the user-graph side but neglect necessary feature interaction between the candidate news and user-graphs.
In this work, we propose Dual-Interactive Graph ATtention networks (DIGAT) to address the aforementioned limitations.DIGAT consists of newsand user-graph channels to encode the candidate news and user history, respectively.In the newsgraph channel, we introduce semantic-augmented graph (SAG) modeling to enrich the semantic representation of the single candidate news.In SAG, the original candidate news is regarded as the root node, while the semantic-relevant news documents are represented as the extended nodes to augment the semantics of the candidate news.We integrate the local and global contexts of SAG as the semanticaugmented candidate news representations.
In the user-graph channel, motivated by Mao et al. (2021) and Wu et al. (2021), we model user history with a news-topic graph to represent multilevels of user interests.Most notably, we design a dual-graph interaction process to learn news-and user-graph representations with effective feature interaction.Different from the individual graph attention network (Veličković et al., 2018), DIGAT updates news and user graph embeddings with the interactive attention mechanism.Particularly, in each layer of the dual-graph, the user (news) graph context is incorporated into its dual news (user) node embedding learning iteratively.
Extensive experiments on the benchmark dataset MIND (Wu et al., 2020) show that DIGAT significantly outperforms the existing news recommendation methods.Further ablation studies and analyses confirm that semantic-augmented news graph modeling and dual-graph interaction can substantially improve news recommendation performance.

Related Work
Personalized news recommendation is important to online news services (Okura et al., 2017;Yi et al., 2021).Existing neural news recommendation methods typically aim to learn informative news and user representations (Wang et al., 2018;Zhu et al., 2019;An et al., 2019;Wu et al., 2019a,b,d;Liu et al., 2020;Wang et al., 2020;Qi et al., 2021a,b,c;Wu et al., 2021;Li et al., 2022).For example, An et al. (2019) used a CNN network to extract textual representations from news titles and used a GRU network to learn short-term user interests combined with long-term user embeddings.The matching probabilities between candidate news and users are computed over the learned news and user representations.Wu et al. (2019d) utilized multihead self-attention networks to learn informative news and user representations from news titles and user clicked history.These methods regarded the single candidate news as the input to news encoder, which may not contain sufficient semantics to represent a user-interested news topic.Different from these methods, we encode the candidate news with semantic-augmented graphs to enrich its semantic representations.More recently, graph-based methods were proposed for news recommendation (Ge et al., 2020;Hu et al., 2020a,b;Mao et al., 2021;Wu et al., 2021).For example, Wu et al. (2021) proposed a heterogeneous graph pooling method to learn fine-grained user representations.However, feature interaction between candidate news and users is inadequate or neglected in these methods.In contrast, our approach models effective feature interaction between news and user graphs for accurate news-user representation matching.Then, we use the multihead self-attention network MSA(Q, K, V) of Transformer encoder (Vaswani et al., 2017) to learn the contextual representations H n ∈ R |T |×d (where d is the feature dimension).Finally, we employ an attention network f att (•) to aggregate the news semantic representation h ∈ R d • The attentive aggregation function f att (•) is implemented by a feed-forward network in our experiments.It is worth noting that the semantic news encoder in our framework is plug-and-play, which can be easily replaced by any other textual encoders or pretrained language models, e.g., BERT (Devlin et al., 2019) or DeBERTa (He et al., 2021).

News Graph Encoding Channel
In this section, we will explain the news semanticaugmented graph (SAG) construction and graph context learning.Our motivation is to retrieve semantic-relevant news from training corpus and construct a semantic-augmented graph to enrich the semantics of the original single candidate news.

News Graph Construction
Semantic-relevant News Retrieval.Pretrained language models (PLM) have achieved remarkable performance (Reimers andGurevych, 2019, 2020) on semantic textual similarity (STS) benchmarks.Motivated by Lewis et al. (2020), we utilize a PLM ϕ(•) to retrieve semantic-relevant news from training news corpus3 to augment the semantic information of the original single candidate news.In the retrieval process, the semantic similarity score s i,j of news n i and n j (corresponding texts T i and T j ) is computed by the similarity function sim(•, •): Semantic-augmented Graph (SAG).For the original candidate news n can , we initialize it as the root node v 0 of the semantic-augmented news graph G n .We build G n by repeatedly extending semanticrelevant neighboring nodes to existing nodes of G n .
In each graph construction step, for an existing node v i (correspoding news N i ) of G n , M news documents {N j } M j=1 are retrieved from the news corpus {N C } with the highest semantic similarity scores {s i,j } M j=1 .We extend the nodes {v j } M j=1 as neighboring nodes to the node v i by adding bidirectional edge {e i,j } M j=1 between them.To heuristically discover semantic-relevant news in higherorder relations, we repeatedly extend the semanticrelevant news nodes within K hops from the root node.The scale of news graph G n is approximated to be O(M K ).Detailed SAG construction and qualitative analysis are provided in Appendix A.

News Graph Context Extraction
Given an SAG G n generated from the candidate news node v 0 with N semantic-relevant news nodes {v i } N i=1 , we use the semantic news encoder (introduced in Section 3.1) to extract their semantic representations as h n,0 ∈ R d and {h n,i } N i=1 ∈ R N ×d .We aim to extract the graph context c n ∈ R d which augments the semantics of the candidate news n can by aggregating the information of G n .We consider the original semantics of the candidate news preserved in the root node v 0 and regard the local graph context as h L n = h n,0 ∈ R d .Besides, we employ an attention module to aggregate the global graph context h G n ∈ R d from the semanticrelevant news nodes to encode the overall semantic information of G n .In the attention module, we regard the root node embedding h n,0 as the query and the semantic-relevant news node embeddings {h n,i } N i=1 as the key-value pairs: , where W Q n ∈ R d×d and W K n ∈ R d×d are parameter matrices.We integrate the local and global graph contexts by a simple feed-forward gating network FFN g (•) to derive the news graph context c n : The parameters of news graph context extractor are shared among different graph layers of DIGAT (the user graph context extractor in Section 3.3.2also shares parameters likewise).

User Graph Construction
Motivated by Mao et al. (2021) and Wu et al. (2021), we model user history with graph structure to encode multi-levels of user interests.We build a user graph G u containing news nodes and topic nodes: (1) For a user's clicked news we treat it as a set of news nodes for news-level user interest representation.
(2) For the clicked news n j , it is pertaining to a specific news topic 4 t(i).We treat the clicked news topics as topic nodes for topic-level user interest representation.To capture the interaction among news and topics, we introduce three types of edges: News-News Edge.News nodes with the same topic category (e.g., Sports) are fully connected.In this way, we can capture the relatedness among clicked news with news-level interaction.
News-Topic Edge.We model the interaction between clicked news and topics by connecting news nodes to their pertaining topic nodes.
Topic-Topic Edge.Topic nodes are fully connected.In this way, we can capture the overall user interests with topic-level interaction.

User Graph Context Extraction
Given the user history H u = [n 1 , n 2 , ..., n |H| ], we employ the semantic news encoder (introduced in Section 3.1) to learn the historical news embeddings Qi et al. (2021c), we extract the graph context c u ∈ R d in a hierarchical way.First, we employ an attention module to learn the topic representation ht(i) ∈ R d of the topic t(i).The topic-attention module regards the news graph context c n as the query and the news embeddings {h n u,j } n j ∈t(i) of topic t(i) as the key-value pairs: 4 For example, in the MIND dataset (Wu et al., 2020), each news has a topic category (e.g., Sports and Entertainment).
Attn(Q, K, V) in Eq. ( 5) and ( 6) denotes the standard attention module with Query/Key/Value.We implement Attn(Q, K, V) as scaled dot-product attention (Vaswani et al., 2017) in our experiments.

Dual-Graph Interaction
In news graph G n , node embeddings {h n,i } |Gn| i=0 contain the information of augmented candidate news semantics.In user graph G u , node embeddings {h u,i } |Gu| i=0 contain the information of user history.We learn informative news and user graph embeddings by aggregating neighboring node information with stacked graph attention layers (Veličković et al., 2018).Most notably, our dual-graph interaction model aims at facilitating effective feature interaction between the news and user graphs.By effective dual-graph feature interaction, accurate news-user representation matching can be achieved.In the dual-graph interaction, the (l + 1)-layer news node embeddings h ), as illustrated in Figure 2. We illustrate the news node embedding update process for example.We first perform a linear transformation on the l-layer news node embedding h (l) n,i to derive higher-level graph features ĥn,i : , where Ŵl n ∈ R d×d and bl n ∈ R d are learnable.In order to learn news node embeddings interacting with user graph, we incorporate the user graph context c (l) u into news graph attention computation.For news node i and node j ∈ N n i (where N n i is the neighborhood of node i), we incorporate user graph context c (l) u into computing the attention key vector K i,j .We use a feed-forward network FFN (l) n to compute K i,j based on the fused information of n,i and h (l) n,j .The news graph attention coefficient α i,j is computed aware of user graph context: , where a T n is a learnable attention weight vector.Finally, we aggregate the neighboring node embeddings with attention coefficient α i,j , followed by ReLU activation.Residual connection is applied to mitigate gradient vanishing in deep graph layers: The news and user graph contexts c (l) n and c (l) u are extracted from the l-layer graph node embeddings as described in Section 3.2.2 and 3.3.2.We summarize Eq. ( 7) to (10) as the news node embedding update function Similarly, the update function of user node embeddings is formulated as u,i , h The dual-graph interaction can be viewed as an iterative process that performs (1) user graph contextaware attention to update news node embeddings and (2) news graph context-aware attention to update user node embeddings.We model the dual interaction with L stacked layers.The final layers of news and user graph contexts c L n and c L u are adopted as news and user graph representations r n and r u which refine the news and user graph information with deep feature interaction.Algorithm 1 illustrates the dual-graph interaction process.

Click Prediction and Model Training
With the news and user graph representations r n and r u , our model aims to predict the matching score ŝn,u which signals how likely user u will click news n.The matching score between news and user representations is simply computed by dot product as ŝn,u = r T n r u .Following Wu et al. (2019a,b,d), we adopt negative sampling strategy to train our model.For the user behavior that user u had clicked news n i , we compute the click matching score as ŝ+ i for n i and u.Besides, we randomly sample S nonclicked news [n 1 , n 2 , ..., n S ] from the user's behavior log and compute the negative matching scores as ].We optimize the NCE loss L over the training dataset D in model training: 4 Experiments

Dataset and Experiment Settings
We conduct experiments on the real-world benchmark dataset MIND (Wu et al., 2020) consists of 50000 users, which are randomly sampled from MIND-large with the impression logs.Following previous works (Wang et al., 2020;Qi et al., 2021c), we use news titles with the maximum length of 32 words for news textual encoding.The user history includes 50 news items they have recently clicked.The news word embeddings are 300-dimensional and initialized from the pretrained Glove embeddings (Pennington et al., 2014).Following An et al. (2019), we set the number of negative news samples S to be 4.For our model parameters, the news representation dimension d is set as 400 for fair comparison to baselines.The number of neighboring nodes M and hops K are 5 and 2, respectively.We set the number of dualgraph interaction layers as L = 3.We use Adam optimizer (Kingma and Ba, 2015) with the learning rate of 1e-4 to train our model.Following Wu et al. (2020), we employ the recommendation ranking metrics AUC, MRR, nDCG@5, and nDCG@10 to evaluate model performance.

Compared Methods
We compare our model with the state-of-the-art news recommendation methods: (1) GRU (Okura et al., 2017), learning user representations from a sequence of clicked news with a GRU network; (2) DKN (Wang et al., 2018), using a knowledgeaware CNN to learn news representations from both news texts and knowledge entities; (3) NPA (Wu et al., 2019b), encoding news and user representations with personalized attention networks; (4) NAML (Wu et al., 2019a), learning news representations from news titles, bodies, categories and subcategories with multi-view attention networks; (5) LSTUR (An et al., 2019), jointly modeling longterm user embeddings and short-term user interests learned by a GRU network; (6) NRMS (Wu et al., 2019d), encoding informative news and user representations with multihead self-attention networks; (7) FIM (Wang et al., 2020), encoding news content with dilated convolutional networks and modeling user interest matching with 3D convolutional networks; (8) HieRec (Qi et al., 2021c), modeling user interests in a three-level hierarchy and performing multi-grained matching between candidate news and hierarchical user interest representations.
We also compare our model with competitive graph-based methods: (9) GERL (Ge et al., 2020), modeling the news-user relatedness with a bipartite graph, which enhances news and user representations by aggregating neighboring node information; (10) GNewsRec (Hu et al., 2020a), using graph neural networks (GNN) (Hamilton et al., 2017) and attentive LSTMs to jointly model users' long-term and short-term interests; (11) User-as-Graph (Wu et al., 2021), utilizing a heterogeneous graph pooling method to extract user representations from personalized heterogeneous behavior graphs.

Main Experiment Results
Table 1 presents the main experiment results.We can observe that DIGAT significantly outperforms previous SOTA methods (i.e., methods #1 to #8) on the both datasets.This is because even though some baselines use topic categories or knowledge entities to enrich news information (e.g., HieRec learns news representations from both news texts

Ablation Study on SAG Modeling
We examine the effectiveness of SAG modeling with three ablation experiments: (1) w/o SA.To examine the effectiveness of semantic-augmentation (SA) strategy, we remove SAG from DIGAT and learn single candidate news representation instead.
(2) TF-IDF SA.To inspect the function of the news retrieval PLM ϕ(•) in SAG construction (see Section 3.2.1),we replace ϕ(•) with a TF-IDF syntactic feature extractor to retrieve relevant news.
(3) Seq SA.To examine the effectiveness of graphbased SA, we conduct controlled experiments by arranging the semantic-relevant news in a sequential form and extracting the news sequence context similar to Eq. ( 3) and ( 4).Experiments in this section and the following sections are on MIND-small.Table 2 shows the experiment results.We can see that abandoning the SA strategy (w/o SA) leads to the largest performance drop, as TF-IDF SA and Seq SA also yield better performance than w/o SA.This validates the effectiveness of SA strategy to enrich candidate news semantics and further enhance news recommendation.TF-IDF SA underperforms DIGAT by a considerable margin.We infer that the TF-IDF features can only measure news similarity at the syntactic level, which may not be able to accurately retrieve semantic-relevant news for SAG construction.In contrast, PLM can accurately measure news similarity at the semantic level and help retrieve more relevant news to enhance SAG modeling.It reveals that accurately retrieving semantically relevant news is the key to candidate news semantic-augmentation.Besides, Seq SA is suboptimal compared to the original graph-based SA.This is because the graph-based SA method can accurately model the relatedness among the candidate news and semantic-relevant news with multineighbor and multi-hop graph structure, which further improves the effectiveness of the SA strategy.

Ablation Study on Graph Interaction
To examine the effectiveness of dual-graph interaction, we design the following ablation experiments: (1) w/o Interaction.We employ the vanilla graph attention networks (GAT) (Veličković et al., 2018) to learn news and user graph embeddings, respectively, without interaction between dual graphs.
(2) News Graph w/o Inter.The news graph embedding update layers are replaced with vanilla GAT layers.Concretely, Eq. ( 11) is modified into , where Φ(l) n is the standard GAT graph embedding update function without feature interaction with user graph context.
(3) User Graph w/o Inter.Similar to (2), we replace the user graph embedding update layers with vanilla GAT layers.
Figure 3 shows the performance of the ablation models.We can see that w/o Interaction underperforms the other three models with graph interaction modeling.It indicates that feature interaction between candidate news and users is necessary to enhance news recommendation.We also observe that removing user graph interaction (User Graph w/o Inter) leads to more performance drop than News Graph w/o Inter, which implies that user graph interaction may contribute more to our model.Moreover, DIGAT surpasses the two single graph interaction ablations by a significant margin, validating the effectiveness of modeling dual-graph feature interaction in an iterative manner.

Analysis on SAG Parameters
We investigate two key parameters of SAG, i.e., the number of node neighbors M and hops K. Figure 4 shows the effect of different M and K settings.
As shown in Figure 4(a), DIGAT performance continues rising as M increases from 1 to 5.This indicates that with more semantic-relevant news incorporated, SAG can leverage more sufficient semantic information to augment the candidate news representations.On the other hand, the model performance slightly declines as M > 5.The reason could be twofold.First, as the scale of SAG grows larger, it becomes more challenging for the model to distill the global graph context of SAG (see Section 3.2.2).Second, as M becomes too large, it is inevitable to retrieve more noisy news in the SAG construction process, which may adversely affect SAG modeling.From Figure 4(b), we observe that K = 2 is the optimal hop setting.This may be because two hops of SAG can heuristically capture more useful semantic-relevant news information than simple one-hop modeling, while higher-hop extension may introduce too much irrelevant news and interfere with accurate semantic augmentation for candidate news.In general, we select M = 5 and K = 2 for SAG construction5 .The number of dual-graph layers L

The Number of Dual-Graph Layers
We study the effect of the number of dual-graph layers L in DIGAT.The results are presented in Figure 5.We can see that the model performance first keeps rising when L increases from 1 to 3.
It suggests that deep feature interaction between news and user graphs is useful to improve recommendation performance, as it can model the news and user representation matching process in a more fine-grained way.We also observe that further increasing L hurts the model performance.It may be caused by the unstable gradient in training the deep dual-graph architecture, as we empirically find that gradient clipping (Pascanu et al., 2013) is indispensable to avoid loss diverging in DIGAT training, in cases when the dual-graph layers become too deep (i.e., L ≥ 6).

Conclusion
In this work, we present a dual-graph interaction framework for news recommendation.In our approach, a graph enhanced semantic-augmentation strategy is employed to enrich the semantic information of candidate news.Moreover, we design a dual-graph interaction mechanism to achieve ef-fective feature interaction between news and user graphs, facilitating more accurate news and user representation matching.Our approach advances the state-of-the-art news recommendation methods on the benchmark dataset MIND.Extensive experiments and further analyses validate that SAG modeling and dual-graph interaction can effectively improve news recommendation performance.

Limitations
In this section, we discuss the limitations of our approach.First, since DIGAT models dual-interaction between news and user graph features iteratively, the inference efficiency is a concern.We compare the model size and inference run-time of experimental methods in Table 3.The news representations (see Section 3.1) of all methods, except NPA6 , are pre-computed and cached for fast inference.As DIGAT is scalable with the dual-graph depth L, we also evaluate DIGAT on L = 1 and 2.
In terms of model size, DIGAT is larger than the first eight models in Table 3.Compared to DIGAT (L = 1, 2), we can see that the parameter growth comes from the stacked graph layers.We also find that embedding layers contain considerable parameters7 , while DIGAT does not need additional news and user ID embedding layers.In terms of inference time, DIGAT runs slower than other models.We find that the computational overhead mostly comes from the iterative graph embedding update process in Eq. ( 11) and ( 12).Nonetheless, this efficiency issue can be alleviated.Since DIGAT is scalable with the dual-graph depth L, the trade-off between recommendation accuracy and efficiency can be made.We can scale down the dual-graph layers L to reduce the model size and inference time with compromising performance.As shown in Table 3, when the dual-graph layers turn down to L = 1, the performance of DIGAT is also superior to baseline methods, while the parameter size and inference time are comparable to several baselines (e.g., FIM and GNewsRec).In industrial deployment, this trade-off can depend on specific requirements of computational resources.
Second, our approach is evaluated on the offline experimental dataset.For online recommender services, searching and retrieving real-time rele-  vant news by event-driven news clustering models (Saravanakumar et al., 2021) to construct SAG is a more promising option than the static retrieval method.To this end, we will explore applying our approach to online applications in future work.6(a), we can observe that there exist many cyclic subgraphs (i.e., news clusters), revealing the news clustering phenomenon in semantic space.These cyclic graph structures depict the similar news clusters in real-world distributions, consistent with the previous research (Altuncu et al., 2018;Saravanakumar et al., 2021).This news semantic clustering phenomenon also inspires the motivation of our work.
Broader Impact.On online news platforms, the Related News is usually displayed along with the original news to users.It is worth mentioning that such Related News on news platforms is practically retrieved from the news database by retrieval models in industrial practice (Algorithm 2 can be seen as such an analogous retrieval process).As an alternative, we can also use the off-the-shelf real-time Related News on online news platforms to construct SAG.Furthermore, the SAG modeling strategy is also applicable to other text-based recommendation (e.g., Twitter Feed Recommendation).We will explore this direction in future work.

B Supplementary Experiments on Semantic-Augmentation Strategy
We conduct supplementary experiments to investigate whether the semantic-augmentation (SA) strategy can be generalized for news recommendation task.To exclude the influence of DIGAT itself, we choose to reinforce the baseline NRMS (Wu et al., 2019d) with SA strategy, named NRMS-SA10 .For NRMS-SA, we use the PLM news retriever to retrieve 10 semantic-relevant news articles for each  Table 4 shows the experiment results, which indicate that semantic-augmentation strategy can also be applied to other news recommendation models and achieve substantial performance improvement.Interestingly, we find that the improvement on MIND-large is more significant than on MINDsmall, as NRMS-SA is even on par with the previous SOTA baseline (i.e., User-as-graph).We infer that it may be because the MIND-large news corpus is an order of magnitude larger than MIND-small, and hence it contains more semantic-relevant news for SAG modeling.The experiment results also suggest that augmenting the semantic representation of single candidate news by relevant news information sources is a promising direction to improve news recommendation performance.

Figure 1 :
Figure 1: The typical news-user representation learning framework for news recommendation.

Figure 2 :
Figure 2: The overall architecture of DIGAT framework.
on the l-layer news node embeddings h (l) n and user graph context c (l) u jointly (vice versa to update the user node embeddings h (l+1) u

Figure 4 :
Figure 4: DIGAT performance with different M and K settings of SAG.

Figure 5 :
Figure 5: DIGAT performance with different numbers of dual-graph layers L.

Figure 6 :
Figure 6: An example of SAG (M = 5 and K = 2) constructed from news n 0 on MIND-large (news ID: N124534): (a) A subgraph of the example SAG including root node n 0 and semantic-relevant news node n i (i = 1, 2, ..., 8); (b) News in SAG and the corresponding title texts.For brevity, we only present an SAG subgraph of nodes and edges.
Denote the clicked-news history of a user u as H u = [n 1 , n 2 , ..., n |H| ], containing |H| clicked news items.For the news n, its textual content consists of a sequence of |T | words as T n = [w 1 , w 2 , ..., w |T | ].Based on H u and T n , the goal of news recommendation is to predict the score ŝn,u , which indicates the probability of the user u clicking the candidate news n can .The recommendation result is generated by ranking the user-click scores of multiple candidate news items.

Table 1 :
(Wu et al., 2021)cted from anonymized user behavior logs of Microsoft News with two versions of MIND-large and MIND-small.MIND-large contains 1 million anonymized users with user-click impression logs of 6 weeks from October 12 to November 22, 2019.The training and dev sets contain the impression logs of the first 5 weeks, and the last week's impression logs are reserved for test.MIND-small Evaluation performance of all methods.Experiments of baseline #1 to #10 and DIGAT are conducted 10 times on MIND-small and 5 times on MIND-large, respectively.We report the average performance.†Results of User-as-Graph are directly copied from the previous work(Wu et al., 2021).The performance improvements of DIGAT compared to all baselines are significant (validated by Student's t-test with p-value < 0.01).

Table 2 :
Experiment results of SAG modeling variants.
circumvents this cold-news issue.Compared to GNewsRec and User-as-Graph, DIGAT performs more effective feature interaction between the news and user graphs, which can enhance more accurate news-user representation matching.

Table 3 :
Comparison of experimental methods' parameters and inference run-time.The Run-time column denotes the inference time on MIND-small test set, which is averaged by 10 times.All models are tested with the same batch size on Nvidia RTX 3090.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages ple of SAG for the candidate news n 0 "Should the NFL be able to fine players for criticizing officiating".Interestingly, from Figure6(b), we can see that there are many similar news articles in SAG, which refer to the same specific news event or person (i.e., "NFL" and "fine players") from different narrative points of view 9 .These semantic-relevant news articles are finely retrieved with the help of PLM retriever, forming explicit multi-neighbor and multi-hop graph structure.With the representation power of SAG, DIGAT can learn more accurate relatedness of the relevant news texts and substantially enrich the semantic information of the original candidate news n 0 .News Clustering Phenomenon.From the SAG example shown in Figure