Search from History and Reason for Future: Two-stage Reasoning on Temporal Knowledge Graphs

Temporal Knowledge Graphs (TKGs) have been developed and used in many different areas. Reasoning on TKGs that predicts potential facts (events) in the future brings great challenges to existing models. When facing a prediction task, human beings usually search useful historical information (i.e., clues) in their memories and then reason for future meticulously. Inspired by this mechanism, we propose CluSTeR to predict future facts in a two-stage manner, Clue Searching and Temporal Reasoning, accordingly. Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts. At the temporal reasoning stage, it adopts a graph convolution network based sequence method to deduce answers from clues. Experiments on four datasets demonstrate the substantial advantages of CluSTeR compared with the state-of-the-art methods. Moreover, the clues found by CluSTeR further provide interpretability for the results.


Introduction
Temporal Knowledge Graphs (TKGs) (Boschee et al., 2015;Demidova, 2018, 2019;Zhao, 2020) have emerged as a very active research area over the last few years. Each fact in TKGs has a timestamp indicating its time of occurrence. For example, the fact, (COVID-19, New medical case occur, Shop, 2020-10-2), indicates that a new medical case of COVID-19 occurred in a shop on 2020-10-2. In this paper, reasoning on TKGs aims to predict future facts (events) for timestamp t > t T , where t T is assumed to be the current timestamp (Jin et al., 2020). An example of the task is shown in Figure 1, which attempts to answer the query (COVID-19, New medical case occur, ?, 2020-12-23) with the given historical facts. Obviously, such a task may benefit many practical applications, such as, emerging events response (Muthiah et al., 2015;Phillips et al., 2017;Korkmaz et al., 2015), disaster relief (Signorini et al., 2011), and financial analysis (Bollen et al., 2011).
How do human beings predict future events? According to the dual process theory (Evans, 1984(Evans, , 2003(Evans, , 2008Sloman, 1996), the first thing is to search the massive-capacity memories and find some related historical information (i.e., clues) intuitively. As shown in the left part of Figure 1, there are mainly three categories of clues vital to the query: 1) the 1-hop paths with the same relation to the query (thus called repetitive 1-hop paths), such as (COVID-19, New medical case occur, Shop); 2) the 1-hop paths with relations different from the query (called non-repetitive 1-hop paths), such as (COVID-19, New suspected case occur, Bank); and 3) the 2-hop paths, such as (COVID-19, Diagnose −1 , The man, Go to, Police station). Human beings recall these clues from their memories and have some intuitive candidate answers for the query. Secondly, human beings get the accurate answer by diving deeper into the clues' temporal information and performing a meticulous reasoning process. As shown in the right part of Figure 1, the man went to the police station more than two months earlier than the time when he was diagnosed with COVID-19, indicating that Police station is probably not the answer. Finally, human beings derive the answer, Shop.
Existing models mainly focus on the above second process but underestimate the first process. Some recent studies (Trivedi et al., 2017(Trivedi et al., , 2018 learn the evolving embeddings of entities with all historical facts considered. However, only a few historical facts are useful for a specific prediction. Thus, some other studies (Jin et al., 2020(Jin et al., , 2019Zhu et al., 2020) mainly focus on encoding the 1-hop repetitive paths (repetitive facts) in the history. However, besides the 1-hop repetitive paths, there are massive other related information in the datasets. Taking the widely used dataset ICEWS18 (Jin et al., 2020) as an example, 41.2% of the training queries can get the answers through the 1-hop repetitive paths in the history. But, almost 64.6% of them can get the answers through 1hop repetitive and non-repetitive paths, and 86.2% through the 1-hop and 2-hop paths.
Thus, we propose a new model called CluSTeR, consisting of two stages, Clue Searching (Stage 1) and Temporal Reasoning (Stage 2). At Stage 1, CluSTeR formalizes clue-searching as a Markov Decision Process (MDP) (Sutton and Barto, 2018) and learns a beam search policy to solve it. At Stage 2, CluSTeR reorganizes the clues found in Stage 1 into a series of graphs and then a Graph Convolution Network (GCN) and a Gated Recurrent Unit (GRU) are employed to deduce accurate answers from the graphs.
In general, this paper makes the following contributions: • We formulate the TKG reasoning task from the view of human cognition and propose a two-stage model, CluSTeR, which is mainly composed of a RL-based clue searching stage and a GCN-based temporal reasoning stage.
• We advocate the importance of clue searching for the first time, and propose to learn a beam search policy via RL, which can find explicit and reliable clues for the fact to be predicted.
• Experiments demonstrate that CluSTeR achieves consistently and significantly better performance on popular TKGs and the clues found by CluSTeR can provide interpretability for the reasoning results.

Related Work
Static KG Reasoning. Embedding based KG reasoning models (Bordes et al., 2013;Yang et al., 2014;Trouillon et al., 2016;Dettmers et al., 2018;Shang et al., 2019;Sun et al., 2018) have drawn increasing attention. All of them attend to learn the distributed embeddings for entities and relations in KGs. Among them, some works (Schlichtkrull et al., 2018;Shang et al., 2019;Ye et al., 2019;Vashishth et al., 2019) extend GCN to relationaware GCN for the KGs. However, embedding based models underestimate the symbolic compositionality of relations in KGs, which limits their usage in more complex reasoning tasks. Thus, some recent works (Xiong et al., 2017;Das et al., 2018;Lin et al., 2018;Chen et al., 2018; focus on multi-hop reasoning, which learns symbolic inference rules from relation paths. However, all the above methods cannot deal with the temporal dependencies among facts in TKGs. Temporal KG Reasoning. Reasoning on temporal KG can broadly be categorized into two settings, interpolation (Sadeghian et al., 2016;García-Durán et al., 2018;Leblay and Chekol, 2018;Dasgupta et al., 2018;Wu et al., 2019;Xu et al., 2020;Goel et al., 2020;Wu et al., 2020;Han et al., 2020a;Jung et al., 2020) and extrapolation (Trivedi et al., 2017(Trivedi et al., , 2018Han et al., 2020b;Deng et al., 2020;Jin et al., 2019Jin et al., , 2020Zhu et al., 2020;Li et al., 2021), as mentioned in Jin et al. (2020). Under the former setting, models attempt to infer missing facts at historical timestamps. While the latter setting, which this paper focuses on, attempts to predict facts in the future. Orthogonal to our work, Trivedi et al. (2017Trivedi et al. ( , 2018 estimate the conditional probability of observing a future fact via a temporal point process taking all historical facts into consideration. Although Han et al. (2020b) extends temporal point process to model concurrent facts, they are more capable of modeling TKGs with continuous time, where no events may occur at the same timestamp. Glean (Deng et al., 2020) incorporates a word graph constructed by the summary texts of events into TKG reasoning. The most related works are RE-NET (Jin et al., 2020) and CyGNet (Zhu et al., 2020). RE-NET uses a subgraph aggregator and GRU to model the subgraph sequence consist-  Figure 2: An illustrative diagram of the proposed CluSTeR model.
ing of 1-hop facts. CyGNet uses a sequential copy network to model repetitive facts. Both of them use heuristic strategies in the clue searching stage, which may lose lots of other informative historical facts or engage some noise. Although the above two models attempt to consider other information by pre-trained global embeddings or an extra generation model, they still mainly focus on modeling repetitive facts. Besides, all the models almost can not provide interpretability for the results.

The Proposed CluSTeR Model
We start with the notations, then introduce the model as well as its training procedure in detail.

Notations
A TKG G is a multi-relational directed graph with time-stamped edges between entities. A fact in G can be formalized as a quadruple (e s , r, e o , t). It describes that a fact of relation type r ∈ R occurs between subject entity e s ∈ E and object entity e o ∈ E at timestamp t ∈ T , where R, E and T denote the sets of relations, entities and timestamps, respectively. TKG reasoning aims to predict the missing object entity of (e s , r q , ?, t s ) or the missing subject entity of (?, r q , e o , t s ) given the set of historical facts before t s , denoted as G 0:ts−1 . Without loss of generality, in this paper, we predict the missing object entity in a fact, and the model can be easily extended to predicting the subject entity.
In this paper, a clue path is in the form of (e s , r 1 , e 1 , ..., r k , e k , ..., r I , e I ), where e k ∈ E, r k ∈ R, k = 1, ..., I, I is the maximum step number and each hop in the path can be viewed as a triple (e k−1 , r k , e k ). Note that, e 0 = e s . The clue facts are derived from the clue paths via mapping each hop (e k−1 , r k , e k ) in the paths to corresponding facts (e k−1 , r k , e k , t 1 ), (e k−1 , r k , e k , t 2 , ...) ∈ G 0:ts−1 .

Model Overview
As illustrated in Figure 2, the model consists of two stages, clue searching and temporal reasoning. The two stages are coordinated to perform fast and slow thinking (Daniel, 2017), respectively, to solve the TKG reasoning task, inspired by human cognition. Specifically, Stage 1 mainly focuses on searching the clue paths of which the compositional semantic information relates to the given query with the time constraints. Then, the clue paths and the consequent candidate entities are provided for the reasoning in Stage 2, which mainly focuses on meticulously modeling the temporal information among clue facts and gets the final results. In the CluSTeR model, these two stages interact with each other in the training phase and decide the final answer jointly in the inference phase.

Stage 1: Clue Searching
The purpose of Stage 1 is to search and induce the clue paths related to the given query (e s , r q , ?, t s ) from history. The previous studies (Jin et al., 2019(Jin et al., , 2020Zhu et al., 2020) use heuristic strategies to extract 1-hop repetitive paths, losing lots of other informative clue paths. Besides, there are enormous facts in the history. Thus, a learnable and efficient clue searching strategy is of great necessity. Motivated by these observations, Stage 1 can be viewed as a sequential decision problem and solved by the RL system.

The RL System
The RL system consists of two parts, the agent and the environment. We formulate the RL system as an MDP, which is a framework of learning from interactions between the agent and the environment to find B promising clue paths. Starting from e s , the agent sequentially selects outgoing edges via randomized beam search strategy, and traverses to new entities until it reaches the maximum step I. The MDP consists of the following parts: States. Each state s i = (e i , t i , e s , r q , t s ) ∈ S is a tuple, where S is the set of all the available states; e i (e 0 = e s ) is the entity where the agent visited at step i; and t i (t 0 = t s ) is the timestamp of the action taken at the previous step. Note that, e s , r q , and t s are shared by all the states for the given query.
Time-constrained Actions. Compared to static KGs, the time dimension of TKGs leads to an explosively large action space. Besides, the human memories focus on the lastest occcuring events. Thus, we constrain the time interval between the timestamp of each fact and t s to be no more than m. And the time interval between the timestamp of the previous action and each available action is no more than ∆. Therefore, the set of the possible actions A i ∈ A (A is the set of all available actions) at step i consists of the time-constrained outgoing edges of e i , To give the agent an adaptive option to terminate, a self-loop edge is added to A i . Transition. A transition function δ : S × A → S is deterministic under the situation of TKG and just updates the state to new entities incident to the actions selected by the agent.
Rewards. The agent only receives a terminal reward R at the end of search, which is the sum of two parts, binary reward and real value reward. The binary reward is set to 1 if the destination entity e I is the correct target entity e o , and 0 otherwise. Besides, the agent gets a real value rewardr from Stage 2 if e I is the target entity, which will be introduced in Section 3.4.

Semantic Policy Network
Given the time-constrained action space, the compositional semantic information implied in the clue paths and the time information of the clue facts is vital for reasoning. However, considering that modeling the time information requires to dive deeply into the complex temporal patterns of facts and is not the emphasis of Stage 1. Thus, we design a semantic policy network which calculates the probability distribution over all the actions according to the current state s i and search history h i = (e s , a 0 , ..., a i−1 ) without considering timestamps in Stage 1. Here, a i = (r i+1 , e i+1 , t i+1 ) is the action taken at step i = 0, ..., I − 1. Note that, h 0 is e s . Actually, the search history without timestamps is a candidate clue path (a clue path at step i) mentioned in Section 3.1.
The embedding of the action a i is a i = r i+1 ⊕ e i+1 , where ⊕ is the concatenation operation; r i+1 , e i+1 are the embeddings of r i+1 and e i+1 , correspondingly. Then, a Long Short Term Memory network (LSTM) is applied to encode the candidate clue path h i as a continuous vector h i , where the initial hidden embedding h 0 equals to LST M (0, r dummy ⊕ e s ) and r dummy is the embedding of a special relation introduced to form a start action with e s . For step i, the action space is encoded by stacking the embeddings of all the actions in A i , which are denoted as Here, d is the dimension of entity embeddings and relation embeddings. Then, the policy network calculates the distribution π over all the actions by a Multi-Layer Perceptron (MLP) parameterized with W 1 and W 2 as follows: where η(·) is the softmax function, f (·) is the ReLU function (Glorot et al., 2011) and Θ is the set of all the learnable parameters in Stage 1.

Randomized Beam Search
In the scenario of TKGs, the occurrence of a fact may result from multiple factors. Thus, multiple clue paths are necessary for the prediction. Besides, the intuitive candidates from Stage 1 should recall the right answers as many as possible. Therefore, we adopt randomized beam search (Sutskever et al., 2014;Guu et al., 2017;Wu et al., 2018) as the action sampling strategy of the agent, which injects random noise to the beam search in order to increase the exploration ability of the agent. Specifically, a beam contains B candidate clue paths at step i. For each candidate path, we append B most likely actions (according to Equation 3) to the end of the path, resulting in a new path pool with size B × B. Then we either pick the highestscoring paths with probability µ or uniformly sample a random path with probability 1 − µ repeatedly for B times. The score of each candidate clue path at step i equals to i k=0 log π(a k |s k ; Θ). Note that, at the first step, B 1-hop candidate paths starting from e s are generated by choosing B paths via the above picking strategy.

Stage 2: Temporal Reasoning
To dive deeper into the temporal information among clue facts at different timestamps and the structural information among concurrent clue facts, Stage 2 reorganizes all clue facts into a sequence of graphsĜ = {Ĝ 0 , ...,Ĝ j , ...,Ĝ ts−1 }, where eachĜ j is a multi-relational graph consisting of clue facts at timestamp j = 0, ...t s − 1. We use an ω-layer RGCN (Schlichtkrull et al., 2018) whereĥ l o,j andĥ l s,j denote the l th layer embeddings of entities o and s inĜ j at timestamp j, respectively; W l r and W l loop are the weight matrices for aggregating features from different relations and self-loop in the l th layer; d o is the in-degree of entity o; the input embedding for each entity k, h l=0 k,j is set toê k , which is different from that of Stage 1.
Then,ĝ j , the embedding ofĜ j , is calculated by the mean pooling operation of all entity embeddings calculated by Equation 4 inĜ j . The concatenation ofê s ,ĝ j andr q (the embedding of r q in Stage 2) is fed into a GRU, The final output of GRU, denoted as H ts−1 , is fed into a MLP decoder parameterized with W mlp to get the final scores for all the entities, i.e., p(e|e s , r q , t s ) = σ(H T ts−1 · W mlp ), where σ is the sigmoid activation function. Finally, we re-rank the candidate entities according to Equation 6. To give a positive feedback to the clue paths arriving at the answer, Stage 2 gives a beam-level reward which equals to the final score of e I from Equation 6, i.e,r = p(e I ), to Stage 1.

Training Strategy
For Stage 1, the beam search policy network is trained by maximizing the expected reward over all queries in the training set, J (Θ)=E (es,rq,eo,ts)∈G [E a 0 ,...a I−1 [R(e I |e s , r q , t s )]].  define the objective function using cross-entropy: where Φ is the set of all the learnable parameters in Stage 2. The Adam (Kingma and Ba, 2014) optimizer is used to minimize Equation 8. As Stages 1 and Stage 2 are correlated mutually, they are trained jointly. Stage 1 is pre-trained with only binary reward before the joint training process starts. Then Stage 2 is trained with the parameters of Stage 1 frozen. At last, we jointly train the two stages. Such a training strategy is widely used by other RL studies (Bahdanau et al., 2016;Feng et al., 2018).

Experiment
We design experiments to answer the following questions: Q1. How does CluSTeR perform on the TKG reasoning task? Q2. How do the two stages contribute to the final results respectively? Q3. Which clues are found and used for reasoning? Q4. Can CluSTeR provide some interpretability for the results?  (Jin et al., 2020). The first three datasets are from the Integrated Crisis Early Warning System (ICEWS) (Boschee et al., 2015) and the last one is from Global Database of Events, Language, and Tone (GDELT) (Leetaru and Schrodt, 2013). We evaluate CluSTeR on all these datasets. ICEWS14 and ICEWS05-15 are divided into training, validation, and test sets following the preprocessing on ICEWS18 in RE-NET (Jin et al., 2020). The details of the datasets are presented in Table 1.
In the experiments, the widely used Mean Reciprocal Rank (MRR) and Hits@{1,10} are employed as the metrics. Without loss of generality, only the experimental results under the raw setting are reported. The filtered setting is not suitable for the reasoning task under the exploration setting, as mentioned in (Han et al., 2020b;Ding et al., 2021;Jain et al., 2020). The reason is explained in terms of an example as follows: Given a test quadruple (Barack Obama, visit,?, 2015-1-25) with the correct answer India. Assume there is a quadruple (Barack Obama, visit, Germany, 2013-1-18) in the training set. The filtered setting used in the previous studies ignores time information and considers (Barack Obama, visit, Germany, 2015-1-25) to be valid because (Barack Obama, visit, Germany, 2013-1-18) appears in the training set. It thus removes the quadruple from the corrupted ones. However, the fact (Barack Obama, visit, Germany) is temporally valid on 2013-1-18, instead of 2015-1-25. Therefore, to test the quadruple (Barack Obama, visit,?, 2015-1-25), (Barack Obama, visit, Germany, 2015-1-18) should not be removed. In this way, the filtered setting wrongly removes quite a lot of quadruples and thus leads to over-optimistic experimental performance.
Baselines. The CluSTeR model is compared with two categories of models, i.e., models for static KG reasoning and models for TKG reasoning under the exploration setting. The typical static models DistMult (Yang et al., 2014), Com-plEx (Trouillon et al., 2016), RGCN (Schlichtkrull et al., 2018), ConvE (Dettmers et al., 2018) and Ro-taE (Sun et al., 2018) are selected with the temporal information of facts ignored. We also choose MIN-ERVA (Das et al., 2018), the RL-based multi-hop reasoning model, as the baseline. For TKG models, the representative Know-evolve (Trivedi et al., 2017), DyRep (Trivedi et al., 2018), CyGNet (Zhu et al., 2020) and RE-NET (Jin et al., 2020) are selected. Besides, following RE-NET (Jin et al., 2020), we extend two models for temporal homogeneous graphs, GCRN (Seo et al., 2018) and EvolveGCN-O (Pareja et al., 2019)), to RGCRN and EvolveRGCN by replacing GCN with RGCN. We use ConvE (Dettmers et al., 2018), a more stronger decoder to replace the MLP (Jin et al., 2020) for the two models. For Know-evolve and DyRep, RE-NET extends them to TKG reasoning task but does not release their codes. Thus, we only report the results from their papers. For other baselines, we reproduce all the results with the optimal parameters tuning on the validation set. Implementation Details. In the experiments, the embedding dimension d for the two stages, is set to 200. For Stage 1, we adopt an adaptive approach for selecting the time interval m. Specifically, for ICEWS14, ICEWS05-15, and GDELT, m is set to the last one timestamp the query pattern (e s , r q , ?) appearing in the dataset before t s . And for ICEWS18, m is set to the last third timestamp. ∆ is set to 3 for all the datasets. We set the maximum step number I = 1, 2 and find I = 1 is better for all the datasets. The number of the LSTM layers is set to 2 and the dimension of the hidden layer of LSTM is set to 200 for all the datasets. The beam size is set to 32 for the three ICEWS datasets and 64 for GDELT. µ is set to 0.3 for all the datasets. For Stage 2, the maximum sequence length of GRU is set to 10, the number of the GRU layers is set to 1 and the number of the RGCN layers is set to 2 for all the datasets. For each fact in G 0:ts−1 , we add the corresponding inverse fact into G 0:ts−1 . All the experiments are carried out on Tesla V100.

Results on TKG Reasoning
The results on TKG reasoning are presented in Table 2. CluSTeR consistently outperforms the baselines on all the ICEWS datasets, which convincingly verifies its effectiveness and answers Q1. Especially on ICEWS14, CluSTeR even achieves the improvements of 7.1% in MRR, 4.5% in Hits@1, and 13.7% in Hits@10 over the best baselines. Specifically, CluSTeR significantly outperforms the static models (i.e., those in the first block of Table 2) because it captures the temporal information of some important history. Moreover, CluS-TeR drastically performs better than those temporal models. Compared with DyRep and Know-evolve that consider all the history, CluSTeR can focus on more vital clues. Different from RGCRN and EvolveRGCN which model all history from several latest timestamps, CluSTeR models a longer history after reducing all history to a few clues. CyGNet and RE-NET mainly focus on modeling the repetitive clues or all the 1-hop clues and show strong performance. CluSTeR also outperforms them on the three ICEWS datasets, because the RL-based Stage 1 can find more explicit and reliable clues.
The experimental results on GDELT demonstrate that the performance of static models and temporal ones are similarly poor, as compared with those of the other three datasets. We further analyze the GDELT dataset and find that a large number of its entities are abstract concepts which do not indicate a specific entity (e.g., PRESIDENT, POLICE   and GOVERNMENT). Among the top 50 frequent entities, 28 are abstract concepts and 43.72% corresponding events involve abstract concepts. Those abstract concepts make future prediction under the raw setting almost impossible, since we cannot predict a president's activities without knowing which country he belongs to.

Ablation Study
To answer Q2, i.e., how the two stages contribute to the final results, we report the MRR results of the variants of CluSTeR on the validation set of all the datasets in Table 3. The first two lines of Table 3 show the results only using Stage 1, where the maximum step I is set to 1 and 2, respectively. Following Lin et al. (2018), the score of the target entity is set to the highest score among the paths when more than one path leads to it. It can be observed that the results decrease when only using Stage 1, because the temporal information among facts is ignored. The third line shows the results only using Stage 2 with extracted 1-hop repetitive clues as the inputs. The results decrease on all the ICEWS datasets when only using Stage 2, demonstrating that only repetitive clues are not enough for the prediction. For GDELT, only Stage 2 achieves the best results, which demonstrates that only using repetitive clues is effective enough for it. It is  because that only using the most straightforward repetitive clues in Stage 2 can alleviate the influence of noise produced by abstract concepts. It also matches our observations mentioned in Section 4.2.
From the first two lines of Table 3, it can be seen that the performance of Stage 1 decreases when I is set to 2. To further analyze the reason, we extract paths from ICEWS18 without considering timestamps via AMIE+ (Galárraga et al., 2015), a widely used and accurate approach to extract logic rules (paths) in static KGs. We check the top fifty paths manually and present the top five convincing paths in Table 4. It can be seen that there are no strong dependencies between the query relations and the 2-hop paths. Thus, in this situation, longer paths bring exponential noise clues, which pull down the precision. We do experiments on all the datasets from ICEWS and GDELT and find the same conclusion. We leave it for future work to construct a more complex dataset for verifying the effectiveness of multi-hop clue paths.

Detail Analysis
To answer Q3, we show some non-repetitive clues found in Stage 1 in Figure 3. We use (relation in 1-hop non-repetitive clue path, query relation) pairs on ICEWS18 to construct a clue graph. Arrows begin with the relations in the clue paths and end with the query relations. It is interesting to find that CluSTeR can actually find some causal relations. Moreover, compared to the 2-hop clue paths shown in Table 4, the 1-hop clue paths are more informative. It also gives explanations to the outperformance of the 1-hop clue paths. Besides, we illustrate the statistics of clue facts used during Stage 2 in Figure 4. The proportion of the repetitive clue facts is less than 7% and the proportion of the non-repetitive clue facts is more than 93% on the datasets. The abundant of the nonrepetitive clue facts used in Stage 2 also explains the outperformance of CluSTeR to a certain degree.

Case Study
To answer Q4, we show how CluSTeR conducts reasoning and explains the results for the given two queris from the test set of ICEWS14 in Fig wrong answer, China, has a higher score than the right one, Iran. It is because Stage 1 does not take the temporal information into consideration. However, the score gap is obvious between Iran and France, which shows that Stage 1 can measure the qualities of different clue paths and distinguish the semantic-related entities from the others. In Stage 2, CluSTeR reorganizes the clue facts by their timestamps, as shown in the right top part of Figure Figure 5 are all associated with the query. Stage 1 induces all entities to only two entities through these clue paths but misleads to the wrong answer, Iran. Actually, even a human may give the wrong answer with only fasting thinking. After diving into the temporal information of clue facts and conduct slow thinking, some causal information and period information can be captured by Stage 2. Although Sign formal agreement is associated with Express intent to settle dispute, it can not be the reason for the latter. Moreover, from the subgraph sequence in the right bottom part of Figure 5, it can be seen that the cooperation period between China and Japen just begins at 363, but the cooperation period between China and Iran has been going on for several days. (China, Express intent to settle dispute, ?, 364) is more likely to be an antecedent event to the cooperation period and the answer is Japen.
Above all, for each fact to be predicted, CluS-TeR can provide the clues for each candidate entity, which presents the insight and provides interpretability for the reasoning results. It is similar to the natural thinking pattern of human, in which only explicit and reliable clues are needed.

Performance under the Time-aware Filtered Setting
As mentioned in Section 4.1, the widely adopted filtered setting in the existing studies is not suitable  for the temporal reasoning task addressed in this paper. The essential problem of the above filtered setting is that it ignores the time information of a fact. Therefore, we also adopt an improved filtered setting where the time information is also considered, thus called time-aware filtered setting (Han et al., 2020b;. Specifically, only the facts occur at the predicted time are filtered. The results are in Table 5. It can been seen that the experimental results under the time-aware filtered setting are close to those under the raw setting. This is because that only a very small number of facts are removed under this filtered setting. The results also show the convincing of the raw setting.

Conclusions
In this paper, we proposed a two-stage model from the view of human cognition, named CluSTeR, for TKG reasoning. CluSTeR consists of a RL-based clue searching stage (Stage 1) and a GCN-based temporal reasoning stage (Stage 2). In Stage 1, CluSTeR finds reliable clue paths from history and generates intuitive candidate entities via RL. With the found clue paths as input, Stage 2 reorganizes the clue facts derived from the clue paths into a sequence of graphs and performs deduction on them to get the answers. By the two stages, the model demonstrates substantial advantages on TKG reasoning. Finally, it should be mentioned that, although the four TKGs adopted in the experiments were created based on the events in the real world, the motivation of this paper is to propose this TKG reasoning model only for scientific research.