Continuous Temporal Graph Networks for Event-Based Graph Data

There has been an increasing interest in modeling continuous-time dynamics of temporal graph data. Previous methods encode time-evolving relational information into a low-dimensional representation by specifying discrete layers of neural networks, while real-world dynamic graphs often vary continuously over time. Hence, we propose Continuous Temporal Graph Networks (CTGNs) to capture continuous dynamics of temporal graph data. We use both the link starting timestamps and link duration as evolving information to model continuous dynamics of nodes. The key idea is to use neural ordinary differential equations (ODE) to characterize the continuous dynamics of node representations over dynamic graphs. We parameterize ordinary differential equations using a novel graph neural network. The existing dynamic graph networks can be considered as a specific discretization of CTGNs. Experiment results on both transductive and inductive tasks demonstrate the effectiveness of our proposed approach over competitive baselines.


Introduction
Graph neural networks (GNNs) have attracted growing interest in the past few years due to their universal applicability in various fields, e.g., social networks (Fan et al., 2019) and natural language processing (Liu et al., 2021a).Graph neural networks (GNNs) learn a lower-dimensional represen-Figure 1: The importance of link duration.Consider the behavior of a user watching movies.There are two types of nodes in the graph: user nodes and item nodes.Given the user's historical behavior, the predicted target is (user 1 , don't_click, Movie_4).If we ignore the link duration information, user 1 seems interested in cartoon movies because he clicked on it at timestamp t 1 .But user 1 only watched the Movie_ 1 for 10s.The link duration indicated that although the user clicked, he was not interested.
tions, such as user-item interactions, often change over time.Learning the node representation on dynamic graphs is a very challenging task.Dynamic graph methods can be divided into discrete-time dynamic graph (DTDG) models and continuous-time dynamic graph (CTDG) models.More recently, an increasing interest in CTDG-based graph representation learning algorithms can be observed (Xu et al., 2020;Trivedi et al., 2018;Kumar et al., 2019;Rossi et al., 2020;Wang et al., 2020b;Ding et al., 2021).
Although the above continuous-time dynamic methods have achieved impressive results, they still have limitations.The majority of research (Rossi et al., 2020;Wang et al., 2020b;Xu et al., 2020;Trivedi et al., 2018;Kumar et al., 2019) pays attention to the contact sequence dynamic graphs, in which the links are permanent, and no link duration is provided (e.g., email networks and citation networks).However, most real-life networks are event-based dynamic graphs in which the interactions between source nodes and destination nodes  are not permanent (e.g., employment networks and proximity networks).The event-based dynamic graph includes the time at which the link appeared and the duration of the link.Link duration reflects the degree of association between the two nodes, e.g., user i browses item j for 2 seconds and k for 20 seconds.It means that the user's interest in the two items j, k is different.Ignoring the link duration information can reduce the link prediction ability and even result in questionable inference.Thus, it is crucial to consider the influence of link duration on node relationship prediction (Zhang and Chen, 2018;Li et al., 2020) and knowledge completion (Liu et al., 2021b).
The existing GNN-based methods (Weinan, 2017;Oono and Suzuki, 2019) that learn the node representation over dynamic graphs can be considered discrete dynamical systems.Chen et al. (2018) demonstrate that the continuous dynamical systems are more efficient for modeling continuous-time dynamic data.The discrete networks can roughly be regarded as continuous networks by stacking enough layers.However, Onno and Suzuki (2019) point out that graph neural networks (GNNs) exponentially lose expressive power for downstream tasks, which will lead to over-smoothing problems as we add more hidden layers.Therefore, designing effective continuous Graph Neural Networks to model continuous-time dynamics of node representation on dynamic graphs is critical.To this end, many continuous graph neural networks (Chen et al., 2018;Xhonneux et al., 2019) have been proposed recently.Although those mentioned above continuous dynamic neural networks are more efficient to model the graph data, few approaches have been proposed for dealing with dynamic graphs using continuous-time dynamic neural networks.
This paper proposes a general framework of continuous temporal graph networks (CTGNs) to model continuous-time representations for dynamic graph-structured data.We combine Ordinary Differential Equation Systems (ODEs) and graphs methods.Instead of specifying discrete hidden layers, we integrate neural layers over continuous time.Figure 2 illustrates the workflow of the proposed CTGN method.There is an interaction between two nodes.First, a novel temporal graph network (TGN) is applied as the encoder to learn the latent states using the updated memory.Then, the neural ODE module is used to model the node's continuous-time representation.Considering that the link duration reflects the degree of association between the two nodes, we use the link duration as the integration variable to control the weights of different interactions.After that, we use the LSTM (Shi et al., 2015) as the decoder to compute the probability of interaction between the two given nodes.Finally, the memory is updated as the input of the encoder.Memory is a compressed representation of the historical behavior of all nodes defined in Section 3.1.Experimental results on five real-world datasets of link prediction demonstrate the effectiveness of the proposed method over the state-of-art baselines.The main contributions of this paper are: • We present a novel Continuous Temporal Graph Network (CTGN) inspired by the neural ODE method.
• CTGNs pay attention to the event-based dynamic graph.CTGNs update the node's representation with both the valid discrete timestamps when the link appears and the link duration between two linked nodes as evolving information.
• We show that our model can outperform existing state-of-the-art methods on both transductive and inductive tasks.

Dynamic Graph Methods
The existing dynamic graph representation learning methods can be divided into two categories, discrete-time dynamic graphs and continuous-time dynamic graphs.Discrete-time dynamic graphs (DTDGs) are a sequence of snapshots at different time intervals.
where T is the number of snapshots.Current dynamic graph methods (Wang et al., 2020a;Trivedi et al., 2017;Xiong et al., 2019) have been mostly designed for discrete-time dynamic graphs (DT-DGs).
Continuous-time dynamic graphs (CTDGs) can be viewed as a set of observations/events (Kazemi et al., 2019), and the network evolution information is retained.There are only a few works on CTDG.But recently, more attention has been paid to continuous-time graphs.All three representations of CTDG are described in more detail below.
1.The contact sequence dynamic graph is the simplest representation form of CTDG.
where u is the source node, v is the destination node, and t is the timestamp when the link appears.In the contact sequence dynamic graph, the link is permanent (e.g., citation networks) or instantaneous (e.g., email networks).Therefore, this graph has no link duration.
There has been a lot of research on contact sequence dynamic graphs.Trivedi et al. (2018) learn the representation of node i by aggregating the node destination's neighborhood information and updating the embedding for the node using a recurrent architecture after an interaction involving node i. Kumar et al. (2019) employ two recurrent neural networks to update the embedding of a user and an item at every interaction.TGAT (Xu et al., 2020) proposes a novel functional time encoding method and uses self-attention to inductive representation learning on temporal graphs.Wang et al. (2020b) propose the asynchronous propagation attention network (APAN) for real-time temporal graph embedding.
2. The event-based dynamic graph consists of the node pairs (u, v), the edge appears timestamp t and the link duration ∆t .Link duration indicates how long the edge lasts until it disappears.
EB = (u i , v i , t i , ∆t i ) .
(3) Rossi et al. (2020) proposes a generic inductive framework operating on contact sequence dynamic graphs by adding a memory module on TGAT (Xu et al., 2020).TGN can also operate on the event-based dynamic graph by simply replacing the timestamp t with link duration ∆t in the memory module.
3. The streams graph can be viewed as a particular case of the event-based dynamic graph.
The streams graph includes the edge label δ, which indicates edge removal or edge addition.
TGN (Rossi et al., 2020) converts the streams graph into an event-based graph for processing.According to the edge label, the event can be reorganized as (u i , v i , t , t), which was created at time t and deleted at time t, then two messages can be computed for the source and target nodes.
The existing CTDG methods model discrete dynamics representations of continuous-time graph data with multiple discrete propagation layers.Our proposed method focuses on the event-based temporal graph and updates the node's representation with both the timestamps and the link duration between the two nodes.CTGN also supports contact sequence dynamic graph.The model details will be slightly different from event-based dynamic graph.We will clarify this point in Chapter 3.

Continuous-time Dynamical Systems
Continuous-time dynamical systems mean that the system's behavior changes with time development in the continuous-time domain.There have been related works that view data as a continuous object in artificial intelligence, e.g., pictures (Chen et al., 2018) and static graphs (Xhonneux et al., 2019;Poli et al., 2019).The continuous-time dynamic graph (CTDG) we introduced in Section 2.1 is also a continuous-time dynamical system in which nodes' state changes over time.Therefore, it is necessary to model the continuous dynamical system of CTDG data.To the best of our knowledge, our CTGN is the first approach that learn continuous-time dynamics on CTDG.

Neural Ordinary Differential Equations and Continuous Graph Neural Networks
Considering a residual network: A theoretical method to improve the performance of discrete networks is to stack more neural layers and take smaller steps (Chen et al., 2018).However, this scheme is not feasible because of the limited computer resources and over-fitting problems.Oono and Suzuki (2019) point out that Graph Neural Networks (GNNs) exponentially lose expressive power for downstream tasks when adding more hidden layers because of over-smoothness problems.
Inspired by residual network and ordinary difference, neural ordinary difference is proposed to solve this problem.Neural ODE models continuous-time dynamical systems by parameterizing the hidden state's derivative using a neural network.3 The Proposed Method: CTGN In this section, we introduce our proposed approach.
The key idea of the CTGN is to build continuous-time hidden layers which can learn continuous informative node representations over event-based dynamic graphs.To characterize the continuous dynamics of node representation, we use ordinary differential equations (ODEs) parameterized by a neural network, which is a continuous function of time.We study both transductive and inductive settings.In the transductive task, we predict future links of the nodes observed during the training phase.In the inductive tasks, we predict future links of the nodes never seen before.We first employ a temporal graph attention layer (Xu et al., 2020) to project each node into a latent space based on its features and neighbors.And then, an ODE module is designed to define the continuous dynamics on the node's latent representation h i (t).

Temporal Graph Network
Memory Passing.Memory s i (t) is used to record the historical information of each node i the model has seen so far.It is a compressed representation of the historical behavior of all nodes.Memory s i (t) is updated when there is an interaction involving node i.At the end of each batch, we firstly compute memory s i (t) using the last time message m i (t − ) and memory s i (t − ): Here, mem(•) is a learnable memory update function.In all experiments, we choose the memory function as GRU.s i (0) is initialized as a zero vector.At the end of each batch, the message m i (t) for the node can be updated to compute i's memory: Here || is the concatenation operator, ∆t is the link duration between node i and j, .In the contact sequence dynamic graph, the link duration property is not available.We use (t − t − ) as ∆t.There may be multiple events e i1 (t 1 ), . . ., e iN (t N ) involving the same node i in the same batch.In the experiment, we only use the latest interaction e iN (t N ) to compute i's message.msg(•) is a learnable function, and we use an RNN network in our experiment: Multi-head Attention.Given an observed event p = (i, j, t, ∆t), we can compute the node latent representation respectively for i and j using: where Q , K , V denote the 'querys', 'keys', 'values', respectively.
i ] are the embedding of the graph nodes of l-th layers.The multi-head attention layer compute the node i's representation by aggregating it's N-hop neighbors.
projection matrices used to generate attention embedding.We define keys and values as the neighbor information.h is node i's memory which saves the history information for the node.E n (t) = [e 1n (t), ..., e in (t)], e in (t) is edge features between node i and it's n-hop neighbor at time t.Temporal graph network is a discrete method that can be thought of as a discretization of the continuous dynamical systems.

Model Continuous Dynamics of Node Representation
In order to characterize the continuous dynamics of node representations, instead of only specifying a discrete sequence of hidden layers, we parameterize the hidden layers using ordinary differential equations (ODEs), a continuous function of time.
Here, x is an initial vector, f is a learnable function, t is a time interval and z is a vector.
We can compute the node's continuous-time dynamics representation by Equation 16at arbitrary time t > 0.
Previous work (Zang and Wang, 2019;Poli et al., 2019) model continuous-time dynamics for data by setting integration variable [0, t] as a hyperparameter.Considering the influence of link duration on the interaction between two nodes, we choose the link duration as the integration variable, in our experiment t = dur.
Link duration shows how long it was (in seconds) until that user terminated browsing.Link duration can reflect the user's interest in different items.Take link duration as an integer variable that can control the weights of different interactions.
We parameterize the derivative of the hidden state using a neural network that takes the latent state, computed by the temporal graph network mentioned in Section 3.1 as input.
Here, h i (t) is a discrete latent state computed by temporal graph networks, ∆t i is the link duration between source node i and destination j. f (t, z) is ODE function, we choose f (t, z) as MLP.A black-box ODE solver computes the final node continuous dynamics embedding z i (t).We utilize the torchdiffeq.odeint_adjointPyTorch package to solve reverse-time ODE and backpropagate.

Time Smoothness
The time-encoding method (Xu et al., 2020) used in this paper is an effective method to map timestamp t from the time domain to d-dim vector space.However, the learning process of each timestamp is independent of other timestamps.Independent learning of hyperplanes of adjacent time intervals may cause adjacent times to be farther apart in embedded space.Actually, adjacent states in the graph should be more similar.To avoid the problem mentioned above, we constrained the variation between hyperplanes at adjacent timestamps by minimizing the euclidean distance:

Model Learning
We use the link prediction loss function for training CTGN: where α is a tradeoff parameter, l task is a loss function defined as the cross-entropy of the prediction and the ground truth.Our experiment found a parameter α of 0.002 for contact sequence dynamic graphs and 0.7 for event-based dynamic graphs.

Experiment and Analysis
In this section, we first introduce datasets, baselines and parameter settings.Then we compare our proposed method with other strong baselines and competing approaches for both the inductive and transductive tasks for two benchmarks contact sequence dynamic graph datasets and three eventbased dynamic graph datasets.
We study both transductive and inductive tasks.For event-based dynamic graphs, we learn link prediction tasks.For contact-sequence dynamic graphs, we learn dynamic node classification and link prediction tasks.
The statistics of the datasets used in our experiments are described in detail in Table 1.

Parameter Setup
We set the batch size to 200 for training and patience to 5 for early stopping in all experiments.The node embedding dimension is 172.
During training, we used 0.0001 as the learning rate for contact sequence dynamic graph datasets (Wikipedia and Reddit) and 0.00009 for eventbased dynamic graph datasets (Netflix, Mooc, Lastfm).The weight of time smoothness loss α is set to 0.002 on Wikipedia , Reddit and 0.7 on Netflix, Mooc, Lastfm.We choose the LSTM layer as the decoder for link prediction task and MLP for node classification task.We report mean and standard deviation across 10 runs.

Result
To demonstrate the effectiveness of our proposed method, we compare CTGN with competitive baselines on five real-world event-based graph datasets.Table 2 shows the results on link prediction tasks in both transductive and inductive settings for three event-based datasets.It is evident that our approach has achieved better results than the discrete dynamics graph neural networks on almost all datasets, especially in the inductive setting.
Table 3 shows the dynamic node classification and link prediction results on two contact sequencedatasets.CTGN has a solid ability to embed dynamic graphs.The conclusion can be obtained from the Table 2 and Table 3. Figure 3 shows ablation studies on the Netflix dataset for both the transductive and inductive setting of the link prediction task.As we can see from Figure 3(a) and 3(b), our model is not sensitive to batch size.When the training batch size is 100, CTGN has the same average precision as TGN.With the continuous increase of batch size, the performance of CTGN is more stable.

Conclusion
This paper introduces CTGN, a continuous temporal graph neural network for learning representation for event-based dynamic graphs.We build the connection between temporal graph networks and continuous dynamical systems inspired by neural ODE.Our framework allows the user to trade off speed for precision by selecting different learning rates and the weight of time smoothness loss parameters during training.We demonstrate on the link prediction task against competitive baselines that our model can outperform many existing stateof-the-art methods.

Figure 2 :
Figure 2: Overview of our Continuous Temporal Graph network.

Figure 3 :
Figure 3: Ablation studies on the Netflix dataset for both the transductive and inductive setting of the link prediction task.3(a) Sensitivity study result of batch size in inductive setting.3(b) Sensitivity study result of batch size in transductive setting.3(c) The relationship between number of sampled neighbors and the model performance in inductive setting.3(d) The relationship between number of sampled neighbors and the model performance in transductive setting.

Table 1 :
Statistics of the datasets used in our experiments.

Table 3 :
Experiments on contact sequence datasets.ROC AUC (%) for the dynamic node classification task, Average Precision (%) for link prediction task.*Static method, †Does not support inductive.