ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations

Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Speciﬁcally, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose E nhanced Temporal Knowledge Embeddings with Co ntextualized La nguage Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA signiﬁcantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task. The code and models are publicly available * .


Introduction
Knowledge graphs (KGs) have long been considered an effective and efficient way to store structural knowledge about the world.A knowledge graph consists of a collection of triples ps, p, oq, where s (subject entity) and o (object entity) correspond to nodes, and p (predicate) indicates the edge type (relation) between the two entities.Common knowledge graphs (Toutanova et al., 2015;Dettmers et al., 2018) assume that the relations between entities are static connections.However, in the real world, there are not only static facts but also temporal relations associated with the entities.To this end, temporal knowledge graphs (tKGs) (Tresp et al., 2015) were introduced that capture temporal aspects of relations by extending a triple to a quadruple, which adds a timestamp to describe when the relation is valid, e.g., (R.T. Erdogan, visit, US, 2019-11-12).If the temporal relationship lasts for several timestamps, most tKGs represent it by a sequence of quadruples, e.g., {(R.T. Erdogan, visit, US, 2019-11-12), (R.T. Erdogan, visit, US, 2019-11-13)}.
Conventional knowledge embedding approaches learn KGs by capturing the structural information, suffering from the sparseness of KGs.To address this problem, some recent studies incorporate textual information to enrich knowledge embedding.KG-BERT (Yao et al., 2019) takes entity and relation descriptions of a triple as the input of a pre-trained language model (PLM) and turns KG link prediction into a sequence classification problem.Similarly, KEPLER (Wang et al., 2021) computes entity representations by encoding entity descriptions with a PLM and then applies KG score functions for link prediction.However, they could not be applied to tKGs.Specifically, existing approaches (e.g., KEPLER) encode an entity, no matter at which timestamp, with the same static embedding based on a shared entity description.In comparison, entity embeddings in tKG models usually evolve over time as entities often involve in different events at different timestamps.Therefore, an entity might be aligned with different textual knowledge at different time.And it should be taken into account which textual knowledge is relevant to which entity at which timestamp.We name this challenge as temporal alignment between texts and tKGs, which is to establish a correspondence between textual knowledge and their tKG depiction.Another challenge is that many temporal knowledge embedding models (Goel et al., 2020;Han et al., 2020a) learn the entity representations as a function of time.However, the existing enhancement approaches cannot be naturally applicable to such tKG embedding.We refer to this challenge as dynamic embedding challenge.In this work, we propose to study enhancing temporal knowledge embedding with textual data.As an approach to this task, we develop Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which uses temporally relevant textual knowledge to enhance the time-dependent knowledge graph embedding.Specifically, we solve the temporal alignment challenge using tKG quadruples as an implicit measure.We pair a quadruple with its relevant textual data, e.g., event descriptions, which corresponds to the temporal relations between entities at a specific time.Then we use the event description to enhance the representations of entities and the predicate involved in the given quadruple.Besides, ECOLA solves the dynamic embedding challenge using a novel knowledge-text prediction (KTP) task which injects textual knowledge into temporal knowledge embeddings.Specifically, given a quadruple-text pair, we feed both the temporal knowledge embeddings of the quadruple and token embeddings of the text into a PLM.The KTP task is an extended masked language modeling task that randomly masks words in texts and entities/predicates/timestamp in quadruples.With the help of the KTP task, ECOLA would be able to recognize mentions of the subject entity and the object entity and align semantic relationships in the text with the predicate in the quadruple.
For training ECOLA, we need datasets with tKG quadruples and aligned textual event descriptions, which are unavailable in the existing temporal KG benchmarks.Thus, we construct three new temporal knowledge graph datasets by adapting two existing datasets, i.e., GDELT (Leetaru and Schrodt, 2013) and Wiki (Dasgupta et al., 2018), and an event extraction dataset (Li et al., 2020).
To summarize, our contributions are as follows: (i) We are the first to address the challenge of enhancing temporal knowledge embedding with temporally relevant textual information while preserving the time-evolving properties of entity embedding.(ii) We construct three datasets to train the text-enhanced tKG models.Specifically, we adapt three existing temporal KG completion datasets by augmenting each quadruple with a relevant textual description.(iii) Extensive experiments show that ECOLA is model-agnostic and can be potentially combined with any temporal KG embedding model.ECOLA also has a superior performance on the temporal KG completion task and enhances temporal KG models with up to 287% relative improvements in the Hits@1 metric.(iv) As a joint model, ECOLA also empowers PLMs by integrating temporal structured knowledge into them.We select temporal question answering as a downstream NLP task, demonstrating that ECOLA can considerably enhance PLMs.

Preliminaries and Related Work
Temporal Knowledge Graphs Temporal knowledge graphs are multi-relational, directed graphs with labeled timestamped edges between entities (nodes).Let E and P represent a finite set of entities and predicates, respectively.A quadruple q " pe s , p, e o , tq represents a timestamped and labeled edge between a subject entity e s P E and an object entity e o P E at a timestamp t P T .Let F represent the set of all true quadruples, the temporal knowledge graph completion (tKGC) is the task of inferring F based on a set of observed facts O. Specifically, tKGC is to predict either a missing subject entity p?, p, e o , tq given the other three components or a missing object entity pe s , p, ?, tq.We provide related works on temporal knowledge representations in Appendix A.
Joint Language and Knowledge Models Recent studies have achieved great success in jointly learning language and knowledge representations.Zhang et al. (2019) andPeters et al. (2019) focus on enhancing language models using external knowledge.They separately pre-train the entity em- bedding with knowledge embedding models, e.g., TransE (Bordes et al., 2013), and inject the pretrained entity embedding into PLMs, while fixing the entity embedding during training PLMs.Thus, they are not real joint models for learning knowledge embedding and language embedding simultaneously.Yao et al. (2019) Nevertheless, none of these works consider the temporal aspect of knowledge graphs, which makes them different from our proposed ECOLA.

ECOLA
In this section, we present the overall framework of ECOLA, including the model architecture in Section 3.1 -3.3, a novel task designed for aligning knowledge embedding and language representation in Section 3.4, and the training procedure in Section 3.5.As shown in Figure 2, ECOLA implicitly incorporates textual knowledge into temporal knowledge embeddings by jointly optimizing the knowledge-text prediction loss and the temporal knowledge embedding loss.Note that, at inference time, we only take the enhanced temporal knowledge embeddings to perform the temporal KG completion task without using PLM and any textual data for preventing information leakage and keeping a fast inference speed.

Embedding Layer
In tKG embedding models, entity representations evolve over time.Thus, the key point of enhancing a time-dependent entity representation e i ptq is to find texts that are relevant to the entity at the time of interest t.To this end, we use tKG quadruples (e.g., pe i , p, e j , tq) as an implicit measure for the alignment.We pair a quadruple with its relevant textual data and use such textual data to enhance the entity representation e i ptq.Therefore, a training sample is a pair of quadruple from temporal KGs and the corresponding textual description, which are packed together into a sequence.As shown in Figure 2, the input embedding is the sum of token embedding, type embedding, and position embedding.For token embedding, we maintain three lookup tables for subwords, entities, and predicates, respectively.For subword embedding, we first tokenize the textual description into a sequence of subwords following (Devlin et al., 2018) and use the WordPiece algorithm (Wu et al., 2016).As the light blue tokens shown in Figure 2, we denote an embedding sequence of subword tokens as tw 1 , ..., w n u.In contrast to subword embedding, the embeddings for entities and predicates are directly learned from scratch, similar to common knowledge embedding methods.We denote the entity embedding and predicate embedding as e and p, respectively, as the dark blue tokens shown in Figure 2. We separate the knowledge tokens, i.e., entities and predicates, and subword tokens with a special token [SEP].To handle different token types, we add type embedding to indicate the type of each token, i.e., subword, entity, and predicate.For position embedding, we assign each token an index according to its position in the input sequence and follow Devlin et al. (2018) to apply fully-learnable absolute position embeddings.

Temporal Knowledge Encoder
As shown in Figure 2, the input embedding for entities and predicates consists of knowledge token embedding, type embedding, and position embedding.In this section, we provide details of the temporal knowledge embedding (tKE) objective.
A temporal embedding function defines entity embedding as a function that takes an entity and a timestamp t as input and generates a time-dependent representation in a vector space.There is a line of work exploring temporal embedding functions.Since we aim to propose a modelagnostic approach, we combine ECOLA with three temporal embedding functions, i.e., DyERNIE-Euclid (Han et al., 2020a), UTEE (Han et al., 2021c), and DE-SimplE (Goel et al., 2020).In the following, we refer to DyERNIE-Euclid as DyERNIE and take it as an example to introduce our framework.Specifically, the entity representation is derived from an initial embedding and a velocity vector e DyER i ptq " ēDyER i `ve i t, where ēDyER i represents the initial embedding that does not change over time, and v e i is an entity-specific velocity vector.The combination with other temporal embedding functions is discussed in Section 4. The score function measuring the plausibility of a quadruple is defined as follows, ϕ DyER pe i , p, e j , tq " ´dpP d e DyER i ptq, e DyER j ptq `pq `bi `bj , (1) where P and p represent the predicate matrix and the translation vector of predicate p, respectively; d denotes the Euclidean distance, and b i , b j are scalar biases.By learning tKE, we generate M negative samples for each positive quadruple in a batch.We choose the binary cross entropy as the temporal knowledge embedding objective (2) where N is the sum of positive and negative training samples, y k represents the binary label indicating whether a training sample is positive or not, p k denotes the predicted probability σpϕ DyER k q, and σp¨q represents the sigmoid function.

Masked Transformer Encoder
To encode the input sequence, we use the pretrained language representation model BERT (Devlin et al., 2018).Specifically, the encoder feeds a sequence of N tokens including entities, predicates, and subwords into the embedding layer introduced in Section 3.1 to get the input embeddings and then computes L layers of d-dimensional contextualized representations.Eventually, we get a contextualized representation for each token, which could be further used to predict masked tokens.

Knowledge-Text Prediction Task
To incorporate textual knowledge into temporal knowledge embedding, we use the pre-trained language model BERT to encode the textual description and propose a knowledge-text prediction task to align the language representations and the knowledge embedding.The knowledge-text prediction task is an extension of the masked language modeling (MLM) task.As illustrated in Figure 2, given a pair of a quadruple and the corresponding event description, the knowledge-text prediction task is to randomly mask some of the input tokens and train the model to predict the original index of the masked tokens based on their contexts.As different types of tokens are masked, we encourage ECOLA to learn different capabilities: • Masking entities.To predict an entity token in the quadruple, ECOLA has the following ways to gather information.First, the model can detect the textual mention of this entity token and determine the entity; second, if the other entity token and the predicate token are not masked, the model can utilize the available knowledge token to make a prediction, which is similar to the traditional semantic matchingbased temporal KG models.Masking entity nodes helps ECOLA align the representation spaces of language and structured knowledge, and inject contextualized representations into entity embeddings.
• Masking predicates.To predict the predicate token in the quadruple, the model needs to detect mentions of the subject entity and object entity and classify the semantic relationship between the two entity mentions.Thus, masking predicate tokens helps the model integrate language representation into the predicate embedding and map words and entities into a common representation space.
• Masking subwords.When subwords are masked, the objective is similar to traditional MLM.The difference is that ECOLA considers not only the dependency information in the text but also the entities and the logical relationship in the quadruple.Additionally, we initialize the encoder with the pretrained BERT base .Thus, masking subwords helps ECOLA keep linguistic knowledge and avoid catastrophic forgetting while integrating contextualized representations into temporal knowledge embeddings.
In each quadruple, the predicate and each entity have a probability of 15% to be masked.Similarly, we mask 15% of the subwords of the textual description at random.We ensure that entities and the predicate cannot be masked at the same time in a single training sample, where we conduct an ablation study in Section 6 to show the improvement of making this constraint.When a token is masked, we replace it with (1) the [MASK] token 80% of the time, (2) a randomly sampled token with the same type as the original token 10% of the time, (3) the unchanged token 10% of the time.For each masked token, the contextualized representation in the last layer of the encoder is used for three classification heads, which are responsible for predicting entities, predicates, and subword tokens, respectively.At last, a cross-entropy loss L KT P is calculated over these masked tokens.

Training Procedure and Inference
We initialize the transformer encoder with BERT base § and the knowledge encoder with random vectors.Then we use the temporal knowledge embedding (tKE) objective L tKE to train the knowledge encoder and use the knowledge-text § https://huggingface.co/bert-base-uncased prediction (KTP) objective L KT P to incorporate temporal factual knowledge and textual knowledge in the form of a multi-task loss: where λ is a hyperparameter to balance tKE loss and KTP loss.Note that those two tasks share the same embedding layer of entities and predicates.At inference time, we aim to answer link prediction queries, e.g., pe s , p, ?, tq.Since there is no textual description at inference time, we take the entity and predicate embedding as input and use the score function of the knowledge encoder, e.g., Equation 1, to predict the missing links.Specifically, the score function assigns a plausibility score to each quadruple, and the proper object can be inferred by ranking the scores of all quadruples tpe s , p, e j , tq, e j P Eu that are accompanied with candidate entities.

The Model-Agnostic Property of ECOLA
ECOLA is model-agnostic and can enhance different temporal knowledge embedding models.Besides ECOLA-DyERNIE, we introduce here two additional variants of ECOLA.
ECOLA-DE enhances DE-SimplE, which applies the diachronic embedding (DE) function (Goel et al., 2020).DE-function defines the temporal embeddings of entity e i at timestamp t as e DE i ptqrns " # a e i rns if 1 ď n ď γd, a e i rns sinpω e i rnst `be i rnsq else.
(3) Here, e DE i ptqrns denotes the n th element of the embeddings of entity e i at time t. a e i , ω e i , b e i P R d are entity-specific vectors with learnable parameters, d is the dimensionality, and γ P r0, 1s represents the portions of the time-independent part.   1.
GDELT is an initiative knowledge base storing events across the globe connecting people and organizations, e.g., (Google, consult, the United States, 2018/01/06).For each quadruple, GDELT provides the link to the news report which the quadruple is extracted from.We assume each sentence that contains both mentions of the subject and object is relevant to the given quadruple, and, thus, temporally aligned with the subject and object at the given timestamp.We pair each of these sentences with the given quadruple to form a training sample.This process is similar to the distant supervision algorithm (Mintz et al., 2009) in the relation extraction task.The proposed dataset contains 5849 entities, 237 predicates, 2403 timestamps, and 943956 quadruples with accompanying sentences.
DuEE is originally a human-annotated dataset for event extraction containing 65 event types and 121 argument roles.Each sample contains a sentence and several extracted event tuples.We select 41 event types that could be represented by quadruples and reformat DuEE by manually converting event tuples into quadruples and then pairing quadruples with their corresponding sentence.
Wiki is a temporal KG dataset proposed by Leblay and Chekol (2018).Following the postprocessing by Dasgupta et al. (2018), we discretize the time span into 82 different timestamps.We align each entity to its Wikipedia page and extract ¶ https://www.gdeltproject.org/data.html#googlebigquery|| https://ai.baidu.com/broad/download** https://www.wikidata.org/wiki/Wikidata:Main_Pagethe first section as its description.To construct the relevant textual data of each quadruple, we combine the subject description, relation, and object description into a sequence.In this case, the knowledge-text prediction task lets the subject entity learn the descriptions of its neighbors at different timestamps, thus, preserving the temporal alignment between time-dependent entity representation and textual data.

Experiments
We evaluate the enhanced temporal knowledge embedding on the temporal KG completion task.Specifically, we take the entity and predicate embedding of ECOLA-DyERNIE and use Equation 1to predict missing links.The textual description of test quadruples could introduce essential information and make the completion task much easier.Thus, to make a fair comparison with other temporal KG embedding models, we take the enhanced lookup table embedding of temporal KGs to perform the link prediction task at test time but use neither textual descriptions of test quadruples nor the language model.We report such results in Table 2.As additional results, we also show the prediction outcome that takes the text description of test quadruples as input in Figure 4a.
Table 2: Temporal link prediction results: Mean Reciprocal Rank (MRR, %) and Hits@1/3(%).The results of the proposed fusion models (with prefix ECOLA-) and their counterpart KG models are listed together.
Quantitative Study Table 2 reports the tKG completion results on the test sets, which are averaged over three trials.Firstly, we can see that ECOLA-UTEE improves its baseline temporal KG embedding model, UTEE, by a large margin, demonstrating the effectiveness of our fusing strategy.Specifically, ECOLA-UTEE enhances UTEE on GDELT with a relative improvement of 95% and 99% in terms of mean reciprocal rank (MRR) and Hits@3, even nearly four times better in terms of Hits@1.Thus, its superiority is clear on GDELT, which is the most challenging dataset among benchmark tKG datasets, containing nearly one million quadruples.Secondly, ECOLA-UTEE and ECOLA-DE generally outperform UTEE and DE-SimplE on the three datasets, demonstrating that ECOLA is model-agnostic and can enhance different tKG embedding models.Besides, in the DuEE dataset, ECOLA-DyERNIE achieves a better performance than DyERNIE in Hits@1 and MRR, but the gap reverses in Hits@3.The reason could be that ECOLA-DyERNIE is good at classifying hard negatives using textual knowledge, and thus has a high Hits@1; however, since DuEE is much smaller than the other two datasets, ECOLA-DyERNIE may overfit in some cases, where the ground truth is pushed away from the top 3 ranks.
Ablation Study We compare DE-SimplE, ECOLA-DE, and ECOLA-SF on GDELT in Figure 3a.ECOLA-SF is the static counterpart of ECOLA-DE, where we do not consider the temporal alignment while incorporating textual knowledge.Specifically, ECOLA-SF integrates all textual knowledge into the time-invariant part of entity representations.We randomly initialize an embedding vector ēi P R d for each entity e i P E, where ēi has the same dimension as the token embedding in the pre-trained language model.Then we learn the time-invariant part ēi via the knowledge-text prediction task.For the temporal KG completion task, we combine ēi with temporal knowledge embeddings, a e i rns sinpω e i rnst `be i rnsq else, where e SF i ptq P R d is an entity embedding containing static and temporal embedding part.a e i , ω e i , b e i P R d´γd are entity-specific vectors with learnable parameters.W sf P R dˆγd is matrix with learnable weights.As shown in Figure 3a, the performance gap between ECOLA-DE and ECOLA-SF is significant, demonstrating the tempo- ral alignment between time-dependent entity representation and textual knowledge is more powerful than the static alignment.
Moreover, Figure 3b shows the results of different masking strategies on GDELT.The first strategy, e.g., Masking E+R+W, allows to simultaneously mask predicate, entity, and subword tokens in the same training sample.The second strategy is Masking E/R+W, where we mask 15% subword tokens in the language part, and either an entity or a predicate in the knowledge tuple.In the third strategy called Masking E/R/W, for each training sample, we choose to mask either subword tokens, an entity, or the predicate.Figure 3b shows the advantage of the second masking strategy, indicating that remaining adequate information in the knowledge tuple helps the model to align the knowledge embedding and language representations.
Qualitative Analysis To investigate why incorporating textual knowledge can improve the tKG embedding models' performance, we study the test samples that have been correctly predicted by the fusion model ECOLA-DE but wrongly by the tKG model DE-SimplE.It is observed that language representations help overcome the incompleteness of the tKG by leveraging knowledge from augmented textual data.For example, there is a test quadruple (US, host a visit, ?, 2019-11-14) with ground truth R.T. Erdogan.The training set contains a quite relevant quadruple, i.e., (Turkey, intend to negotiate with, US, 2019-11-11).However, the given tKG does not contain information indicating that the entity R.T. Erdogan is a representative of Turkey.So it is difficult for the tKG model DE-SimplE to infer the correct answer from the above-mentioned quadruple.In ECOLA-DE, the augmented textual data do contain such information, e.g."The president of Turkey, R.T. Erdogan, inaugurated in Aug. 2014.",which narrows the gap between R.T. Erdogan and Turkey.Thus, by integrating textual information into temporal knowledge embedding, the enhanced model can gain additional information which the knowledge base does not include.

Discussion
Inference with Textual Data In Section 6, we compared different tKG embedding models, where textual data of test quadruples is absent during inference time.However, if the textual descriptions of the test quadruples are given during inference, will the contextualized language model incorporate this information into tKG embeddings?We use the entity predictor of the knowledge-text prediction task to perform the tKG completion task on GDELT.As shown in Figure 4a, the results show significant improvement across all metrics, specifically, 145% relatively higher regarding MRR of ECOLA-UTEE when given textual data during inference than not given.Thus, the results confirm that KTP task is a good choice for successful alignment between knowledge and language space and ECOLA utilizes the pre-trained language model to inject language representations into temporal knowledge embeddings.
Masking Temporal Information in KTP As temporal alignment is crucial for enhancing temporal knowledge embeddings, we study the effect of masking temporal information by extending the existing KTP task with an additional time prediction task, where the timestamp in the input is masked, and the model learns to predict the original timestamp.The extended model is named tECOLA-UTEE and has significant performance gain on both GDELT and Wiki datasets across all metrics as shown in Table 3.We conjecture that the additional time prediction task forces the model to capture the temporal dynamics in temporal knowledge embeddings and utilize the temporal information in given textual descriptions.Since each temporal knowledge embedding models the temporal information in different ways, masking and predicting temporal information will be specific to each temporal knowledge embedding model.We leave this finding to future work for further inspections.
Temporal Question Answering Although we focus on generating informative temporal knowledge embeddings in this work, joint models often benefit both the language model and the temporal KG model mutually.Unlike previous joint models (Zhang et al., 2019;Peters et al., 2019), we do not modify the Transformer architecture, e.g., adding entity linkers or fusion layers.Thus, the language encoder enhanced by external knowledge can be adapted to a wide range of downstream tasks as easily as BERT.Besides the tKG completion task, we evaluate the enhanced language model in ECOLA on the temporal question-answering task to study its enhancement.Natural questions often include temporal constraints, e.g., who was the US president before Jimmy Carter?To deal with such challenging temporal constraints, temporal question answering over temporal knowledge base, formulated as TKGQA task, has become trendy since tKGs help to find entity or timestamp answers with support of temporal facts.Saxena et al. (2021) introduced the dataset CRONQUESTIONS containing natural temporal questions with different types of temporal constraints.They proposed a baseline CRONKGQA that uses BERT to understand the temporal constraints, followed by a scoring function for answer prediction.We apply ECOLA to enhance the BERT in CRONKGQA then plug it back into CRONKGQA and finetune it on the question answering dataset.We name the enhanced model as ECOLA-CRONKGQA.The models are evaluated with standard metrics Hits@kpk P t1, 3uq: the percentage of times that the true entity or time candidate appears in the top k of ranked candidates.Figure 4b shows that our proposed ECOLA considerably enhances CronKGQA, demonstrating the benefits of ECOLA to the language model.

Conclusion
We introduced ECOLA to enhance time-evolving entity representations with temporally relevant textual data using a novel knowledge-text prediction task.Besides, we constructed three datasets that contain paired structured temporal knowledge and unstructured textual descriptions, which can benefit future research on fusing temporal structured and unstructured knowledge.Extensive experiments show ECOLA can improve various temporal knowledge graph models by a large margin.

Limitations
To train ECOLA, we need to provide structured knowledge with aligned unstructured textual data to the model.Thus, we should either manually pair quadruples with event descriptions or use some matching algorithm to automatically build the pairs.The former requires human labeling effort and is hard to apply on large-scale datasets, while the latter would introduce noise into the dataset.Thus, ECOLA is currently tailored for domain adaptation and enhances pre-trained models with domain knowledge.There is still work to be done to let models be jointly trained on large-scale structured and unstructured data.information in users' lives.Since most temporal knowledge graphs are automatically extracted from web data, it's important to ensure it does not contain offensive content.ECOLA can be used to classify the quadruples in temporal knowledge graphs using the pre-trained language model and contribute to the knowledge graph protection's perspective.

E Documentation of the artifacts
This paper uses three datasets, GDELT, Wiki, and DuEE.GDELT mainly covers social and political events written in English.Wiki in this paper mainly contains evolving knowledge, i.e., affiliation and residence place information, which is also written in English.DuEE is a dataset in Chinese and mainly talks about social news, such as the launch of new electronic products.

Figure 1 :
Figure 1: An example of a temporal knowledge graph with textual event descriptions.
, Kim et al. (2020), and Wang et al. (2021) learn to generate entity embeddings with PLMs from entity descriptions.Moreover, He et al. (2019), Sun et al. (2020), and Liu et al. (2020) exploit the potential of contextualized knowledge representation by constructing subgraphs of structured knowledge and textual data instead of treating single triples as training units.
ECOLA-UTEE enhances UTEE(Han et al., 2021c) that learns a shared temporal encoding for all entities to address the overfitting problem of DE-SimplE on sparse datasets.Compared to ECOLA-DE, ECOLA-UTEE replaces Equation 3 with e U T EE i ptq " r ēi ||a sinpωt `bqs, ēi P R γd ; a, w, b P R p1´γqd , where ēi denotes entityspecific time-invariant part, || denotes concatenation, a, ω, and b are shared among all entities.

Figure 4 :
Figure 4: (a) Results of tKG completion task on GDELT with and without using a textual description of test quadruples.(b) ECOLA benefits language representations on the temporal question-answering task.The BERT model means that we directly apply BERT on the CronQuestions dataset, the CronKGQA is a model proposed by Saxena et al. (2021), and ECOLA-CronKGQA represents the model where we enhance CronKGQA using ECOLA.

Table 1 :
Datasets StatisticsDataset # Entities # Predicates # Timestamps # training set # validation set # test set , p, e o , tq, the key point is to find texts that are temporally relevant to e s and e o at t. Existing tKG datasets do not provide such information.To facilitate the research on integrating textual knowledge into temporal knowledge embedding, we reformat GDELT ¶ , DuEE || , and Wiki ** .We show the dataset statistics in Table

Table 3 :
Performance of masking temporal information on the knowledge-text prediction task.

Table 5 :
The runtime of the training procedure (in hours).

Table 6 :
Search space of hyperparameters.

Table 7 :
The number of parameters (M ).