A Novel Global Feature-Oriented Relational Triple Extraction Model based on Table Filling

Table filling based relational triple extraction methods are attracting growing research interests due to their promising performance and their abilities on extracting triples from complex sentences. However, this kind of methods are far from their full potential because most of them only focus on using local features but ignore the global associations of relations and of token pairs, which increases the possibility of overlooking some important information during triple extraction. To overcome this deficiency, we propose a global feature-oriented triple extraction model that makes full use of the mentioned two kinds of global associations. Specifically, we first generate a table feature for each relation. Then two kinds of global associations are mined from the generated table features. Next, the mined global associations are integrated into the table feature of each relation. This “generate-mine-integrate” process is performed multiple times so that the table feature of each relation is refined step by step. Finally, each relation’s table is filled based on its refined table feature, and all triples linked to this relation are extracted based on its filled table. We evaluate the proposed model on three benchmark datasets. Experimental results show our model is effective and it achieves state-of-the-art results on all of these datasets. The source code of our work is available at: https://github.com/neukg/GRTE.


Introduction
Relational triple extraction (RTE) aims to extract triples from unstructured text (often sentences), and is a fundamental task in information extraction. These triples have the form of (subject, relation, object) , where both subject and object are entities and they are semantically linked by relation. RTE is important for many downstream applications.
Nowadays, the dominant methods for RTE are the joint extraction methods that extract entities and relations simultaneously in an end-to-end way. Some latest joint extraction methods (Yu et al., 2019;Wei et al., 2020;Sun et al., 2021) have shown their strong extraction abilities on diverse benchmark datasets, especially the abilities of extracting triples from complex sentences that contain overlapping or multiple triples.
Among these existing joint extraction methods, a kind of table filling based methods Zhang et al., 2017;Miwa and Bansal, 2016;Gupta et al., 2016) are attracting growing research attention. These methods usually maintain a table for each relation, and each item in such a table is used to indicate whether a token pair possess the corresponding relation or not. Thus the key of these methods is to fill the relation tables accurately, then the triples can be extracted based on the filled tables. However, existing methods fill relation tables mainly based on local features that are extracted from either a single token pair  or the filled history of some limited token pairs (Zhang et al., 2017), but ignore following two kinds of valuable global features: the global associations of token pairs and of relations.
These two kinds of global features can reveal the differences and connections among relations and among token pairs. Thus they are helpful to both the precision by verifying the extracted triples from multiple perspectives, and the recall by deducing new triples. For example, given a sentence "Edward Thomas and John are from New York City, USA.", when looking it from a global view, we can easily find following two useful facts. First, the triple (Edward Thomas, live_in, New York) is helpful for extracting the triple (John, live_in, USA), and vice versa. This is because the properties of their (subject, object) pairs are highly similar: (i) the types of both subjects are same (both are per-sons); (ii) the types of both objects are same too (both are locations). Thus these two entity pairs are highly possible to possess the same kind of relation. Second, the mentioned two triples are helpful for deducing a new triple (New York, located_in, USA). This is because that: (i) located_in requires both its subjects and objects be locations; (ii) located_in is semantically related to live_in; (iii) live_in indicates its objects are locations. Thus there is a clear inference path from these two known triples to the new triple. Obviously, these global features are impossible to be contained in local features.
Inspired by above analyses, we propose a global feature-oriented table filling based RTE model that fill relation tables mainly based on above two kinds of global associations. In our model, we first generate a table feature for each relation. Then all relations' table features are integrated into a subjectrelated global feature and an object-related global feature, based on which two kinds of global associations are mined with a Transformer-based method. Next, these two kinds of mined global associations are used to refine the table features. These steps are performed multiple times so that the table features are refined gradually. Finally, each table is filled based on its refined feature, and all triples are extracted based on the filled tables.
We evaluate the proposed model on three benchmark datasets: NYT29, NYT24, and WebNLG. Extensive experiments show that it consistently outperforms the existing best models and achieves the state-of-the-art results on all of these datasets.

Related Work
Early study (Zelenko et al., 2003;Zhou et al., 2005;Chan and Roth, 2011) often takes a kind of pipeline based methods for RTE, which is to recognize all entities in the input text first and then to predict the relations for all entity pairs. However, these methods have two fatal shortcomings. First, they ignore the correlations between entity recognition and relation prediction. Second, they tend to suffer from the error propagation issue.
To overcome these shortcomings, researchers begin to explore the joint extraction methods that extract entities and relations simultaneously. According to the research lines taken, we roughly classify existing joint methods into three main kinds.
Tagging based methods. This kind of methods (Zheng et al., 2017;Yu et al., 2019;Wei et al., 2020) often first extract the entities by a tagging based method, then predict relations. In these models, binary tagging sequences are often used to determine the start and end positions of entities, sometimes to determine the relations between two entities too. Seq2Seq based methods. This kinds of methods (Zeng et al., 2018(Zeng et al., , 2019Nayak and Ng, 2020) often view a triple as a token sequence, and convert the extraction task into a generation task that generates a triple in some orders, such as first generates a relation, then generates entities, etc.
Table filling based methods. This kind of methods (Miwa and Bansal, 2016;Gupta et al., 2016;Zhang et al., 2017; would maintain a table for each relation, and the items in this table usually denotes the start and end positions of two entities (or even the types of these entities) that possess this specific relation. Accordingly, the RTE task is converted into the task of filling these tables accurately and effectively.
Besides, researchers also explore other kinds of methods. For example, Bekoulis et al. (2018) formulate the RTE task as a multi-head selection problem.  cast the RTE task as a multi-turn question answering problem. Fu et al. (2019) use a graph convolutional networks based method and Eberts and Ulges (2019) use a span extraction based method. Sun et al. (2021) propose a multitask learning based RTE model.

Table Filling Strategy
Given a sentence S = w 1 w 2 . . . w n , we will maintain a table table r (the size is n × n) for each relation r (r ∈ R, and R is the relation set). The core o will be inputted to TFG only at the first iteration. The dotted arrow to TG means that T F (N ) will be inputted into TG only at the last iteration. of our model is to assign a proper label for each table item (corresponding to a token pair). Here we define the label set as L = {"N/A", "MMH", "MMT", "MSH", "MST", "SMH", "SMT", "SS"}.
For a token pair indexed by the i-th row and the j-th column, we denote it as (w i , w j ) and denote its label as l. If l ∈ {"MMH", "MMT", "MSH", "MST", "SMH", "SMT"}, it means (w i , w j ) is correlated with a (subject, object) pair. In such case, the first character in the label refers to the subject is either a multi-token entity ("M") or a single-token entity ("S"), the second character in the label refers to the object is either a multi-token entity ("M") or a single-token entity ("S"), and the third character in the label refers to either both w i and w j are the head token of the subject and object ("H") or both are the tail token of the subject and object ("T"). For example, l = "MMH" means w i is the head token of a multi-token subject and w j is the head token of a multi-token object. As for other cases, l = "SS" means (w i , w j ) is an entity pair; l = "N/A" means w i and w j are none of above cases. Figure 1 demonstrates partial filled results for the live_in relation given the sentence "Edward Thomas and John are from New York City, USA.", where there are (subject, object) pairs of (Edward Thomas, New York City), (Edward Thomas, New York), (Edward Thomas, USA), (John, New York City), (John, New York) and (John, USA).
An main merit of our filling strategy is that each of its label can not only reveal the location information of a token in a subject or an object, but also can reveal whether a subject (or an object) is a single token entity or multi token entity. Thus, the total number of items to be filled under our filling strategy is generally small since the information carried by each label increases. For example, given a sentence S = w 1 w 2 . . . w n and a relation set R, the number of items to be filled under our filling strategy is n 2 |R|, while this number is (2|R| + 1) n 2 +n 2 under the filling strategy used in TPLinker ) (this number is copied from the original paper of TPLinker directly). One can easily deduce that (2|R| + 1) n 2 +n 2 > n 2 |R|.

Model Details
The architecture of our model is shown in Figure 2. It consists of four main modules: an Encoder module, a Table Feature Generation (TFG) module, a Global Feature Mining (GFM) module, and a Triple Generation (TG) module. TFG and GFM are performed multiple time with an iterative way so that the table features are refined step by step. Finally, T G fills each table based on its corresponding refined table feature and generates all triples based on these filled tables. Encoder Module Here a pre-trained BERT-Base (Cased) model (Devlin et al., 2018) is used as Encoder. Given a sentence, this module firstly encodes it into a token representation sequence (denoted as H∈ R n×d h ).
Then H is fed into two separated Feed-Forward Networks (FFN) to generate the initial subjects feature and objects feature (denoted as H  Here the table feature for the relation r at the t-th iteration is denoted as T F (t) r , and it has the same size with table r . Each item in T F (t) r represents the label feature for a token pair. Specifically, for a pair (w i , w j ), we denoted its label feature as T F o,j are the feature representations of tokens w i and w j at the t-th iteration respectively. GFM Module This module mines the expected two kinds of global features, based on which new subjects and objects features are generated. Then these two new generated features will be fed back to TFG for next iteration. Specifically, this module consists of following three steps.
Step 1, to combine table features. Supposing current iteration is t, we first concatenate the table features of all relations together to generate an unified table feature (denoted as T F (t) ). And this unified table feature will contain the information of both token pairs and relations. Then we use a max pooling operation and an FFN model on T F (t) to generate a subject-related table feature (T F Here the max pooling is used to highlight the important features that are helpful for the subject and object extractions respectively from a global perspective. Step 2, to mine the expected two kinds of global features. Here we mainly use a Transformer-based model (Vaswani et al., 2017) to mine the global associations of relations and of token pairs. First, we use a Multi-Head Self-Attention method on T F (t) s/o to mine the global associations of relations. The self-attention mechanism can reveal the importance of an item from the perspective of other items, thus it is very suitable to mine the expected relation associations.
Then we mine the global associations of token pairs with a Multi-Head Attention method. The sentence representation H is also taken as part of input here. We think H may contain some global semantic information of a token to some extent since the input sentence is encoded as a whole, thus it is helpful for mining the global associations of token pairs from a whole sentence perspective.
Next, we generate new subjects and objects features with an FFN model.
In summary, the whole global association mining process can be written with following Eq.(4).
Step 3, to further tune the subjects and objects features generated in previous step.
One can notice that if we flat the iterative modules of TFG and GFM, our model would equal to a very deep network, thus it is possible to suffer from the vanishing gradient issue. To avoid this, we use a residual network to generate the final subjects and objects features, as written in Eq. (5).
Finally, these subjects and objects features are fed back to the TFG module for next iteration. Note that the parameters of TFG and GFM are shared cross different iterations. TG Module Taking the table features at the last iteration (T F (N ) ) as input, this module outputs all the triples. Specifically, for each relation, its table is firstly filled with the method shown in Eq. (6).
wheretable r (i, j) ∈ R |L| , and table r (i, j) is the labeled result for the token pair (w i , w j ) in the table of relation r.
Then, TG decodes the filled tables and deduces all triples with Algorithm 1. The main idea of our algorithm is to generate an entity pair set for each relation according to its filled table. And each entity pair in this set would correspond to a minimal Algorithm 1 Table Decoding Strategy Input: The relation set R, the sentence S = {w1, w2, ..., wn}, and all tabler ∈ R n×n for each relation r ∈ R. Output: The predicted triplet set, RT . 1 Define two temporary triple sets H and T, and initialize H, T, RT ← ∅, ∅, ∅. 2 for each r ∈ R do 3 Define three temporary sets W P H r , W P T r , and W P S r , which consist of token pairs whose ending tags in tabler are "H", "T" and "S" respectively. 4 for each (wi, wj) ∈ W P H r do // forward search 5 1) Find a token pair (w k , wm) from W P T r that satisfies: i ≤ k, j ≤ m, tabler [(wi, wj)] and tabler[(w k , wm)] match, (wi, wj) and (w k , wm) are closest in the table, and the number of words contained in subject w i...k and object wj...m are consistent with the corresponding tags. 6 2) Add (w i...k , r, wj...m) to H. 7 end for 8 for each (w k , wm) ∈ W P T r do // reverse search 9 1) Find a token pair (wi, wj) from W P H r with a similar process as forward search. 10 2) Add (w i...k , r, wj...m) to T. 11 end for 12 for each (wi, wj) ∈ W P S r do 13 Add (wi, r, wj) to RT 14 end for 15 end for 16 RT ← RT ∪ H ∪ T 17 return RT continuous token span in the filled table. Then each entity pair would form a triple with the relation that corresponds to the considered table. Specifically, in our decoding algorithm, we design three paralleled search routes to extract entity pairs of each relation. The first one (forward search, red arrows in Figure 1) generates entity pairs in an order of from head tokens to tail tokens. The second one (reverse search, green arrows in Figure 1) generates entity pairs in an order of from tail tokens to head tokens, which is designed mainly to handle the nested entities. And the third one (blue arrows in Figure 1) generates entity pairs that are single-token pairs.
Here we take the sentence shown in Figure 1 as a concrete sample to further explain our decoding algorithm. For example, in the demonstrated table, the token pair (Edward, New) has an "MMH" label, so the algorithm has to search forward to concatenate adjacent token pairs until a token pair that has a label "MMT" is found, so that to form the complete (subject, object) pair. And the forward search would be stopped when it meets the token pair (Thomas, York) that has the label "MMT". However, the formed entity pair (Edward Thomas, New York) is a wrong entity pair in the demonstrated example since the expected pair is (Edward Thomas, New York City). Such kind of errors are caused by the nested entities in the input sentence, like the "New York" and "New York City". These nested entities will make the forward search stops too early. In such case, the designed reverse search will play an important supplementary role. In the discussed example, the reverse search will first find the token pair (Thomas, City) that has an "MMT" label and has to further find a token pair that has an "MMH" label. Thus it will precisely find the expected entity pair (Edward Thomas, New York City). Of course, if there are few nested entities in a dataset, the reverse search can be removed, which would be better for the running time. Here we leave it to make our model have a better generalization ability so that can be used in diverse datasets.

Loss Function
We define the model loss as follows.
where y r,(i,j) ∈ [1, |L|] is the index of the ground truth label of (w i , w j ) for the relaion r.

Experimental Settings
Datasets We evaluate our model on three benchmark datasets: NYT29 (Takanobu et al., 2019), NYT24 (Zeng et al., 2018) and WebNLG (Gardent et al., 2017). Both NYT24 and WebNLG have two different versions according to following two annotation standards: 1) annotating the last token of each entity, and 2) annotating the whole entity span. Different work chooses different versions of these datasets. To evaluate our model comprehensively, we use both kinds of datasets. For convenience, we denote the datasets based on the first annotation standard as NYT24 * and WebNLG * , and the datasets based on the second annotation standard as NYT24 and WebNLG. Some statistics of these datasets are shown in Table 1. Evaluation Metrics The standard micro precision, recall, and F1 score are used to evaluate the results.
Note that there are two match standards for the RTE task: one is Partial Match that an extracted triplet is regarded as correct if the predicted relation and the head tokens of both subject entity and object entity are correct; and the other is Exact Match that a triple would be considered correct only when its entities and relation are completely matched with a correct triple. To fairly compare with existing models, we follow previous work Wei et al., 2020;Sun et al., 2021) and use Partial Match on NYT24 * and WebNLG * , and use Exact Match on NYT24, NYT29, and WebNLG.
In fact, since only one token of each entity in NYT24 * and WebNLG * is annotated, the results of Partial Match and Exact Match on these two datasets are actually the same.
Baselines We compare our model with following strong state-of-the-art RTE models: CopyRE Most of the experimental results of these baselines are copied from their original papers directly. Some baselines did not report their results on some of the used datasets. In such case, we report the best results we obtained with the provided source code (if the source codes is available). For simplicity, we denote our model as GRTE, the abbreviation of Global feature oriented RTE model. Implementation Details Adam (Kingma and Ba, 2015) is used to optimize GRTE. The learning rate, epoch and batch size are set to 3×10 −5 , 50, 6 respectively. The iteration numbers (the hyperparameter N ) on NYT29, NYT24 * , NYT24, WebNLG * and WebNLG are set to 3, 2, 3, 2, and 4 respectively. Following previous work (Wei et al., 2020;Sun et al., 2021;, we also implement a BiLSTM-encoder version of GRTE where 300-dimensional GloVe embeddings (Pennington et al., 2014) and 2-layer stacked BiLSTM are used. In this version, the hidden dimension of these 2 layers are set as 300 and 600 respectively. All the hyperparameters reported in this work are determined based on the results on the development sets. Other parameters are randomly initialized. Following CasRel and TPLinker, the max length of input sentences is set to 100.

Main Experimental Results
The main results are in the top two parts of Table 2, which show GRTE is very effective. On all datasets, it achieves almost all the best results in term of F1 compared with the models that use the same kind of encoder (either the BiLSTM based encoder or the BERT based encoder). The only exception is on NYT24 * , where the F1 of GRTE LST M is about 1% lower than that of PMEI LST M . However, on the same dataset, the F1 score of GRTE BERT is about 2.9% higher than that of PMEI BERT .
The results also show that GRTE achieves much better results on NYT29, NYT24 and WebNLG: its F1 scores improve about 1.9%, 1.1%, and 3.3% over the previous best models on these three datasets respectively. Contrastively, its F1 scores improve about 0.5% and 0.5% over the previous best models on NYT24 * and WebNLG * respectively. This is mainly because that GRTE could not realize its full potential on NYT24 * and WebNLG * where only one token of each entity is annotated. For example, under this annotation standard, except "N/A", "SSH", and "SST", all the other defined labels in GRTE are redundant. But it should be noted that the annotation standard on NYT24 * and WebNLG * simplifies the RTE task, there would not be such a standard when a model is really deployed. Thus, the annotation standard on NYT29, NYT24 and WebNLG can better reveal the true performance of a model. Accordingly, GRTE's better performance on them is more meaningful.
We can further see that compared with the previous best models, GRTE achieves more performance improvement on WebNLG than on other datasets. For example, GRTE LST M even outperforms all other compared baselines on WebNLG, including those models that use BERT. We think this is mainly because that the numbers of relations in WebNLG are far more than those of in NYT29 and   NYT24 (see Table 1), which means there are more global associations of relations can be mined. Generally, the more relations and entities there are in a dataset, the more global correlations there would be among triples. Accordingly, our model could perform more better on such kind of datasets than other local features based methods. For example, the number of relations in WebNLG is almost 7 times of those in NYT, and GRTE achieves much more performance improvement over the compared baselines on WebNLG than on NYT.

Detailed Results
In this section, we conduct detailed experiments to demonstrate the effectiveness of our model from following two aspects.
First, we conduct some ablation experiments to evaluate the contributions of some main components in GRTE. To this end, we implement following model variants.
(i) GRTE w/o GF M , a variant that removes the GFM module completely from GRTE, which is to evaluate the contribution of GFM. Like previous table filling based methods, GRTE w/o GF M extracts triples only based on local features.
(ii) GRTE GRU GIF , a variant that uses GRU (taking H and T F (t) s/o as input) instead of Transformer to generate the results in Eq. (4), which is to evaluate the contribution of Transformer.
(iii) GRTE w/o m−h , a variant that replaces the multi-head attention method in GFM with a singlehead attention method, which is to evaluate the contribution of the multi-head attention.
(iv) GRTE w/o shared , a variant that uses different parameters for the modules of TFG and GFM at different iterations, which is to evaluate the contribution of the parameter share mechanism.
All these variants use the BERT-based encoder. And their results are shown in the bottom part of Table 2, from which we can make following observations.
(1) The performance of GRTE w/o GF M drops greatly compared with GRTE, which confirms the importance of using two kinds of global features for table filling. We can further notice that on NYT29, NYT24, and WebNLG, the F1 scores of GRTE w/o GF M increases by 0.4%, 0.4%, and 0.8% respectively over TPLinker. Both TPLinker and GRTE w/o GF M extract triples based on local features, and the main difference between them is the table filling strategy. So these results prove the effectiveness of our table filling strategy. The F1 scores of GRTE w/o GF M on NYT24 * and WebNLG * are slightly lower than those of TPLinker, as explained above, this is because each entity in NYT24 * and WebNLG * , only one token is annotated for each entity, GRTE w/o GF M could not realize its full potential.

Model
NYT24 WebNLG Normal SEO EPO T = 1 T = 2 T = 3 T = 4 T ≥ 5 Normal SEO EPO T = 1 T = 2 T = 3 T = 4 T ≥ 5  Table 3: F1 scores on sentences with different overlapping pattern and different triplet number. Results of CasRel are copied from TPLinker directly. "T" is the number of triples contained in a sentence. * means the results are produced by us with the provided source codes. (2) The performance of GRTE GRU GF M drops compared with GRTE, which indicates Transformer is more suitable for the global feature mining than GRU. But even so, we can see that on all datasets, GRTE GRU GF M outperforms almost all previous best models and GRTE w/o GF M in term of F1, which further indicates the effectiveness of using global features.
(3) The results of GRTE w/o m−h are lower than those of GRTE, which shows the multi-head attention mechanism plays an important role for global feature mining. In fact, the importance of different features is different, the multi-head attention mechanism performs the feature mining process from multiple aspects, which is much helpful to highlight the more important ones.
(4) The results of GRTE w/o shared are slightly lower than those of GRTE, which shows the share mechanism is effective. In fact, the mechanism of using distinct parameters usually works well only when the training samples are sufficient. But this condition is not well satisfied in RTE since the training samples of a dataset are not sufficient enough to train too many parameters.
Second, we evaluate the influence of the iteration number N . The results are shown in Figure 3, from which following observations can be made.
(1) On NYT24 * and WebNLG * , the annotation standard is relatively simple. So GRTE achieves the best results with two iterations. But on NYT29, NYT24, and WebNLG, more iterations are usually required. For example, GRTE achieves the best results when N is 3, 3, and 4 respectively on them.
(2) On all datasets, GRTE gets obvious performance improvement (even the maximum performance improvement on some datasets) at N = 2 where GFM begins to play its role , which indicates again that using global features can significantly improve the model performance.
(3) GRTE usually achieves the best results within a small number of iterations on all datasets including WebNLG or WebNLG * where there are lots of relations. In fact, GRTE outperforms all the pervious best models even when N = 2. This is a very important merit because it indicates that even used on some datasets where the numbers of relations are very large, the efficiency would not be a burden for GRTE, which is much meaningful when GRTE is deployed in some real scenarios.

Analyses on Different Sentence Types
Here we evaluate GRTE's ability for extracting triples from sentences that contain overlapping triples and multiple triples. For fair comparison with the previous best models (CasRel, TPLinker, and SPN), we follow their settings which are: (i) classifying sentences according to the degree of overlapping and the number of triples contained in a sentence, and (ii) conducting experiments on different subsets of NYT24 * and WebNLG * .
The results are shown in Table 3. We can see that: (i) GRTE achieves the best results on all three kinds of overlapping sentences on both datasets, and (ii) GRTE achieves the best results on almost all kinds of sentences that contain multiple triples. The only exception is on NYT24 * where the F1 score of GRTE is slightly lower than that of SPN when T is 1. The main reason is that there are less associations among token pairs when T is 1, which slightly degrades the performance of GRTE.  In fact, GRTE maintains a table for each relation, and the TG module extracts triples for each relation independently. Thus it can well handle above two kinds of complex sentences by nature. Table 4 shows the comparison results of computational efficiency between GRTE and some previous best models. To be fair, we follow the settings in TPLinker: analyze the parameter scale and the inference time on NYT * and WebNLG * . All the results are obtained by running the compared models on a TitanXP, and the batch size is set to 6 for all models that can be run in a batch mode.

Analyses on Computational Efficiency
The parameter number of GRTE is slightly larger than that of TPLinker, which is mainly due to the using of a Transformer-based model. But when compared with SPN that uses the Transformer model too, we can see that GRTE has a smaller number of parameters due to its parameter share mechanism.
We can also see that GRTE achieves a very competitive inference speed. This is mainly because of following three reasons. First, GRTE is a onestage extraction model and can process samples in a batch mode (CasRel can only process samples one by one). Second, as analyzed previously, it has an efficient table filling strategy that needs to fill fewer table items. Third, as analyzed previously, GRTE often achieves the best results within a small number of iterations, thus the iteration operations will not have too much impact on the inference speed of GRTE.
In fact, as TPLinker pointed out that for all the models that use BERT (or other kinds of pre-trained language models) as their basic encoders, BERT is usually the most time-consuming part and takes up the most of model parameters, so the time cost of other components in a model is not significant.
Besides, there is another important merit of our model: it needs less training time than existing state-of-the-art models like CasRel, TPLinker, and SPN etc. As pointed out previously, the epoch of our model on all datasets is 50. But on the same datasets, the epochs of all the mentioned models are 100. From Table 4 we can see that all these models have similar inference speed. For each model, the training speed of each epoch is very close to its inference speed (during training, there would be extra time cost for operations like the back propagation), thus we can easily know that our model needs less time for training since our model has a far less epoch number.

Conclusions
In this study, we propose a novel table filling based RTE model that extracts triples based on two kinds of global features. The main contributions of our work are listed as follows. First, we make use of the global associations of relations and of token pairs. Experiments show these two kinds of global features are much helpful for performance. Second, our model works well on extracting triples from complex sentences containing overlapping triples or multiple triples. Third, our model is evaluated on three benchmark datasets. Extensive experiments show that it consistently outperforms all the compared strong baselines and achieves state-of-the-art results. Besides, our model has a competitive inference speed and a moderate parameter size.