Modeling Instance Interactions for Joint Information Extraction with Neural High-Order Conditional Random Field

Prior works on joint Information Extraction (IE) typically model instance (e.g., event triggers, entities, roles, relations) interactions by representation enhancement, type dependencies scoring, or global decoding. We find that the previous models generally consider binary type dependency scoring of a pair of instances, and leverage local search such as beam search to approximate global solutions. To better integrate cross-instance interactions, in this work, we introduce a joint IE framework (CRFIE) that formulates joint IE as a high-order Conditional Random Field. Specifically, we design binary factors and ternary factors to directly model interactions between not only a pair of instances but also triplets. Then, these factors are utilized to jointly predict labels of all instances.To address the intractability problem of exact high-order inference, we incorporate a high-order neural decoder that is unfolded from a mean-field variational inference method, which achieves consistent learning and inference. The experimental results show that our approach achieves consistent improvements on three IE tasks compared with our baseline and prior work.


Introduction
Information extraction (IE) has long been considered a fundamental challenge for various downstream natural language understanding tasks, such as knowledge graph construction and reading comprehension, etc.The goal is to identify and extract structured information from unstructured natural language text, such that both users and machines can easily comprehend the entities, relations, and events within the text.
Typically, IE consists of a series of different tasks to recognize entities, connect coreferences, cross-task dependencies.Such isolated learning and inference schemes lead to severely insufficient knowledge capturing and inefficient model constructions.Intuitively, predictions of different IE instances from the same or different tasks can influence each other.For example, a relation between two entities would restrict the types of the entities (e.g., two entities linked by a PART-WHOLE relation are more likely to share entity types of the same nature, as shown in the first example of Figure 1); types of entities can provide information that is useful to predict their relations or limit the roles they play in certain events (e.g., the knowledge of event Life:Die and entity PER can benefit the prediction of the role Victim, as shown in the second example of Figure 1 ).
To effectively capture instance or task dependencies, joint IE tries to simultaneously predict instances of different IE tasks for an input text with a multitask learning scheme, which attracts lots of interest and demonstrates significant improvements over specific-task learning methods.Previous work of joint IE focuses on three directions: 1) representation enrichment by sharing the token encoder between different IE tasks (Luan et al., 2018), up-dating shared span representations according to local task-specific predictions (Luan et al., 2019a;Wadden et al., 2019), creating dependency graphs between instances (Lin et al., 2020;Zhang and Ji, 2021;Van Nguyen et al., 2021), or leveraging external dependency relations such as abstract meaning representation (AMR) and syntactic structures (Zhang and Ji, 2021;Van Nguyen et al., 2022a); 2) type dependency scoring by forming type patterns constraints (Lin et al., 2020), designing type dependency graphs (Van Nguyen et al., 2021), learning transition matrix of type pairs (Van Nguyen et al., 2022a), or computing mutual information (MI) scores of each pair of types (Van Nguyen et al., 2022b); 3) global decoding by beam search according to global features or AMR graphs (Lin et al., 2020;Zhang and Ji, 2021), or adopting global optimization algorithms such as simulated annealing (Van Nguyen et al., 2022a).Our interest lies in the second and third directions and we find two main limitations of prior works.The first one is that they only score binary dependencies of instance types (i.e.constraint, transition, or MI scores between a pair of types).The second one is that their decoders are based on discrete local search strategies to approximate global optima, and they often employ different approximate strategies for inference and training.
To alleviate aforementioned limitations, we propose a novel joint IE framework, Information Extraction as high-order CRF (CRFIE), that explicitly models label correlations between different instances from the same or different tasks, and utilizes them to calculate a joint distribution for final instance label predictions.Specifically, we demonstrate the effectiveness of our proposed high-order framework on three widely-explored IE tasks: entity recognition (EntR), relation extraction (RelE) and event extraction (EventE).We formulate the three tasks as a unified graph prediction problem, further modeled as a high-order Conditional Random field (CRF) (Ghamrawi and Mc-Callum, 2005), where variables contain node variables and edge variables representing trigger/entity instances and role/relation instances respectively.The term "high-order" refers to factors connecting two or more correlated variables.Beyond the unary (first-order) factor, we design not only the binary (second-order) factor to model the interactions between a pair of edge variables but also the ternary (third-order) factor to model the interac-tions between node-edge-node variables.Since the correlated instances may come from the same or different tasks, we categorize our high-order factors into two types: homogeneous factors (homo) representing correlations between instances of the same task, and heterogeneous factors (hete) representing correlations between instances of different tasks.Taking EntR and EventE as an example, we calculate binary factor potentials of rolerole pairs (homo), and ternary factor potentials of trigger-role-entity triplets (hete).We leverage these scores to predict the labels of all instances jointly.Since exact high-order inference is analytically intractable, we incorporate a neural decoder that is unfolded from the approximate Mean-Field Variational Inference (MFVI) (Xing et al., 2012) method, which achieves end-to-end training and also consistent inference and learning processes.Note that MFVI can be seen as a continuous relaxation for CRF inference (Lê-Huu and Alahari, 2021), which can often be more effective than discrete optimization used in previous work.Experiments on joint IE tasks show that CRFIE achieves competitive or better performance compared with previous stateof-the-art models1 .

Overview of Joint IE as Graph Prediction
We investigate three widely-explored IE tasks.£ EntR aims to identify some spans in a sentence as entities and label their entity types.£ RelE aims to identify relations between some entity pairs and label their relation types.£ EventE aims to label event types and its trigger words, identify some entities as event arguments and label argument roles.
We formulate the three IE tasks as a graph G = (V, E) prediction task, where V denotes the node set and E denotes the directed edge set.Each node v = (a, b, l) ∈ V is a span for a trigger or an entity, where a and b index the start and end words of the span, and l ∈ L event or l ∈ L entity denotes the node's event type or entity type, respectively.Each edge e ij = (i, j, r) ∈ E represents the relationship from node v i to node v j , and r ∈ R role or r ∈ R relation represents the edge label which is a role type when the edge is from a trigger to an entity (as an argument) or a relation type when the edge is from one entity to another.The identification module provides spans as nodes to the node/edge labeling module.(B) An example factor graph of our node/edge labeling module containing variables representing three nodes and three edges.X i indicates the label variable of the i-th node v i and X ij indicates the label variable of the edge e ij from the v i to v j .The node labels can be event types or entity types (i.e., X i is the abbreviation of X ntask i for simplicity and ntask ∈ {event, entity}).The edge labels can be relations or argument roles (i.e., X ij is the abbreviation of X etask ij and etask ∈ {relation, role}).For simpler illustration, we omit edges of the opposite direction.
Figure 2(A) depicts the overall architecture of CRFIE.Because joint identification and classification need to enumerate all possible spans as nodes and high-order inference whose complexity is related to the node number becomes too computationally expensive in this situation, we follow previous work (Lin et al., 2020;Zhang and Ji, 2021;Van Nguyen et al., 2021, 2022a) and adopt the following pipeline: first extracting graph nodes with a node identification module, and then predicting labels of nodes and edges with a node/edge labeling module.
The node identification module aims to identify spans in the input sentence as graph nodes.This module is not the focus of our work, so we simply follow previous work (Lample et al., 2016a;Lin et al., 2020;Zhang and Ji, 2021;Van Nguyen et al., 2021) to formulate node identification as a sequence labeling task with a BIO scheme.Specifically, after getting word features by averaging all sub-word embeddings extracted from a pre-trained transformer-based encoder, such as BERT (Devlin et al., 2018), we use two vanilla linear-chain conditional random field (CRF) (Lafferty et al., 2001) as decoders to acquire trigger nodes and entity nodes separately.We follow the conventional joint IE settings without considering nested spans.More advanced methods such as Yu et al. (2020); Lou et al. (2022) can be adopted to identify graph nodes if span nesting needs to be considered.More details about the identification module can be found in Appendix A. The identification module is fixed during subsequent training of the node/edge labeling module.
The node/edge labeling module is designed to predict (i) an event type for each trigger node and an entity type for each entity node and (ii) a role type for each edge between a trigger-entity pair and a relation type for each edge between an entityentity pair.We use a special NULL label to represent non-existence of an edge.We formulate the node/edge labeling module as a high-order CRF, illustrated as a factor graph in Figure 2(B).There are three kinds of factors: unary factors that reflect the likelihood of each variable's label; binary factors for pairs of edges sharing an endpoint, which models correlations between edge variables; and ternary factors for an edge, its head node and its tail node, which models correlations between related node and edge variables.The joint probability over all the variables is proportional to the exponentiated sum of all the score function values of such factors.Due to the intractability of exact highorder inference, we use MFVI to approximate it.A multitask learning scheme is adopted to train our node/edge labeling module.We describe the scoring functions, high-order inference, and learning method in the following subsections in detail.

Unary Scoring
We first obtain each node's representation z by averaging the representations of all the words within a span, in which the words' representations are obtained in the same way as in the identification module, but from another pre-trained transformer-based encoder.Then, the unary scores of the i-th node labels s u-ntask i ∈ R |L ntask | can be obtained by feeding z i into a two layers task-specific feed-forward neural network (FNN): where L ntask represents a task-specific node label set, and ntask ∈ {event, entity}.
The unary scores s u-etask ij of an edge e ij from v i to v j can be computed with a decomposed biaffine function: where two task-specific FNNs are single-layer, H u-etask ∈ R d etask ×|R etask | is parameters, R etask represents a task-specific edge label set that includes an additional NULL label, etask ∈ {relation, role}, and • denotes element-wise product.

Binary Scoring
We calculate binary correlation scores of each legal edge pair that share one endpoint.As illustrated in Figure 2(A), there are three types of binary factors (Wang et al., 2019b): edge e ij and edge e ik share the head node v i , producing sibling (sib); edge e jk and edge e ik share the tail node v k , producing coparent (cop); and the tail node v j of edge e ij is the head node of edge e jk , producing grandparent (gp).For each specific type of binary factor, we use different single-layer FNNs taking z as input to calculate a head representation (-s) and a tail representation (-e) for each node.For gp factor, we additionally calculate a middle representation (-mid) for each node.
For a sib pair {e ij , e ik }, cop pair {e ik , e jk } and gp pair {e ij , e jk }, suppose that the first edge has label r m ∈ R 1 and the second edge has label r n ∈ R2 , we formulate binary scores as follows: where h 1 m is the embedding of the first edge label r m and h 2 n is the embedding of the second edge label r n .All g and h are d 3 -dimensional.For symmetry, In this paper, we consider two types of homogeneous binary factors: homo case (i) sib and cop representing two argument roles (R 1 = R 2 = R role ) and homo case (ii) sib, cop and gp representing two relations (R 1 = R 2 = R relation ).We also consider one type of heterogeneous binary factors: hete case (i) cop and gp where one edge label is a relation and the other is a role for joint EventE and RelE (R

Ternary Scoring
We calculate ternary correlation scores of an edge and its two endpoints.Similar to binary scoring, we use two new FNNs to produce representations for each possible head node and tail node respectively: For an edge with label r m ∈ R, its head node v i having label l p ∈ L s and its tail node v j having label l q ∈ L e , the ternary score is calculated as: (2) where h ter m is the embedding of label r m , e ter-s p is the embedding of label l p and e ter-e q is the embedding of label l q .g, e and h are all d 4 -dimensional.We consider two types of heterogeneous ternary factors: hete case (ii) the ternary correlations between an event trigger, an entity, and a role for joint EventE and EntR (L s = L event , R = R role and L e = L entity ) and hete case (iii) two entities and their relation for joint RelE and EntR (L s = L e = L entity and R = R relation ).

High-Order Inference
In contrast to first-order inference which independently predicts the value of each variable by maximizing its unary score, in high-order inference we jointly predict the values of all the variables to maximize the sum of their unary and high-order scores.However, the exact joint inference on our factor graph is NP-hard in general.Therefore, we use Mean-Field Variational Inference (MFVI) (Xing et al., 2012) for approximate inference.MFVI iteratively updates an approximate posterior marginal distribution Q(X) of each variable X based on messages from all the factors connected to it.For simplicity, we write Messages for edge variables aggregated from binary factors are calculated as: where α 1 , α 2 , α 3 ∈ [0, 1] are hyper-parameters controlling the scale of messages passed by the different types of binary factors.These hyperparameters are not part of standard MFVI and can instead be seen as part of the scoring function.
Messages for node variables and edge variables aggregated from ternary factors are calculated as: The posterior Q(X) is updated based on the messages as follows: where all α ∈ [0, 1] are hyper-parameters controlling the scale of different types of messages, s u-etask ijm is the m-th element of the unary potential s u-etask ij , s u-ntask ip is the p-th element of the unary potential s u-ntask i and s u-ntask jq is the q-th element of s u-ntask j .
There are two ways of iterative MFVI update.In the synchronous update, we update Q(X) for all the variables at each step.In asynchronous update, we alternate between node variables and edge variables for Q(X) update.We empirically find that asynchronous update is better than synchronous update when we use ternary factors in some cases.
The initial distribution Q (0) is set by normalizing exponentiated unary potentials.After a fixed T (which is a hyper-parameter) number of iterations, we obtain the posterior distribution Q (T ) .For each variable, we pick the label with the highest probability according to Q (T ) as our prediction.

Multitask Learning
Given a sentence w = (w 1 , ..., w k ), to train multiple IE tasks with our unified high-order noderelation prediction framework, we do multi-task learning with cross-entropy losses as follows: where Xntask i and Xetask ij denote the ground truth labels of nodes and edges respectively for all the tasks.The conditional distributions over node labels and edge labels with first-order inference are and those with high-order inference are: where Q (T ) is computed with T MFVI iterations.
Inspired by Zheng et al. (2015); Wang et al. (2019b), we unfold the MFVI iteration steps as recurrent neural network layers parameterized by unary and high-order scores.As such, we obtain an end-to-end recurrent neural network for both inference and training.Doing this has an added benefit of consistent inference and training, unlike traditional CRF approaches that may rely on different approximation methods for inference and training (see for example Van Nguyen et al. (2022a)).

Experiments
Datasets We evaluate our model on the ACE2005 corpus (Walker et al., 2005)  RoBERTa model (Liu et al., 2019) as our encoder for the ACE05-E and ACE05-E+ datasets, and AL-BERT model (Lan et al., 2019) as the encoder for the ACE05-R dataset.We train our model with BertAdam optimizer3 .When we use a single kind of factor, α is set to 1 for the used and set to 0 for others.When multiple kinds of factors are used, α of the used are tunable parameters.Detailed hyperparameter values are provided in Appendix B.

Main Results
We take our framework with first-order inference (i.e., independently predicting the value of each variable by maximizing its unary score) as CRFIE baseline.It can be seen that our baseline performs better than previous work in some cases, which benefits from the biaffine function in calculating unary scores.We experiment with different combinations of tasks.
Joint EntR, EventE We compare our approach under different settings and also with previous work that did not leverage gold triggers and entities.Table 2  It can be seen that our high-order model performs better than our baseline in most cases for EventE, which directly shows the benefit of highorder factors.Compared to previous SOTA, our model performs uncompetitive on Tri-I, because we focus on the interactions of node/edge labeling, and we did not tune the hyper-parameters of the node identification module while just keeping them the same as Lin et al. (2020).Even with an unsatis-factory identification module, the results of Arg-C which is the most difficult sub-task in EventE show that CRFIE achieves consistent improvement.It is worth noting that CRFIE with learned dependencies can achieve comparable performance with those models (Zhang and Ji, 2021;Van Nguyen et al., 2022a) leveraging external syntactic or semantic dependencies.It is surprising that when we use both binary factors (homo case (i) )and ternary factors (hete case (ii)) in the RoBERTa setting, the performance slightly drops.The reason may be that messages from different types of factors may conflict with each other, such that training becomes more difficult.We also experiment in the case where gold triggers and entities are given, results are shown in Appendix C.
Joint EntR and RelE Table 3 shows our experimental results on the ACE05-R dataset.We can find that CRFIE performs better than most previous work and our baseline both on EntR and RelE, which demonstrates the advantage of high-order inference.Similar to joint EntR and EventE, our high-order model with the combination of all factors cannot achieve further improvement, so we do not show the result of this setting.
Joint EntR, EventE and RelE Table 4 shows the experimental results on the ACE05-E+ and ERE-EN datasets.On ACE05-E+, we show the result of hete case (i) because this setting is not included in the above experiments.CRFIE all means that we use all kinds of binary and ternary factors.We can find that CRFIE achieves consistent improvement in EventE and RelE.Due to the space limitation, more ablations and experimental results can be found in Appendix D.

Analysis
High-Order Scoring We study two variants of our high-order scoring.Share means that we reuse the label representations in unary scoring for highorder scoring instead of using new label representations.W/o node reps means that we calculate high-order scores without taking node representations into account, such that the high-order scores are only dependent on the labels regardless of the underlying text spans that constituent the nodes and edges.Table 5 shows the comparison results with ternary factors on the ACE05-R dataset.We can find that the performance of the two variants both drops.Message Passing of Ternary Factors From the message passing process involving ternary factors in Sec.2.5, we can see that messages passed to an edge come only from its two endpoints, but a node gets messages from all possible edges connected to it, which causes asymmetry messages from ternary factors, we try synchronous and asynchronous updating strategies as described in Sec 2.5.For asynchronous updating, we firstly update edge posteriors using node posteriors for the reason that the initial node posteriors are more accurate.
Table 6 shows the comparison results of the two updating strategies on the ACE05-E dataset.We can find that asynchronous update has an advantage over synchronous update on Arg-C but harms or keeps the performance on Tri-C.

Complexity and Speed of High-order Inference
The computational complexity of our high-order inference is O(n 3 |R| 2 + n|L|) when we consider binary factors and O(n 2 |R||L| 2 ) when we consider ternary factors, while our first-order model has a computational complexity of O(n 2 |R| + n|L|), where n is the node number.We measure the empirical training speed and inference speed on an A100 server (Table 7).We can find that our high-order models are only slightly slower than the baseline despite the difference in computational complexity, which is because we implement our models with full GPU parallelization.
Visualization of Correlation Score We take relation extraction as an example to visualize the ternary score calculated by Eq. 2 between entityrelation-entity triplets.For better understanding, we show examples of selected entity types and relation types.From Fig. 4, we can find that the correlation scores can reflect some prior knowledge.For example, 'PER-SOC' relation exists between two 'PER' entities, 'PART-WHOLE' relation is more likely to exist between entities with the same types.
Error Correction Analysis and Case Study We provide quantitative error correction analysis in Appendix E. Figure 3 shows examples where our highorder approach revises wrong predictions made based on the initial unary scores (i.e., the firstorder baseline), along with our analyses of how high-order factors achieve the revision.
Recent efforts develop joint methods for multiple IE tasks (Miwa and Sasaki, 2014;Zheng et al., 2017;Nguyen and Nguyen, 2019;Zhang et al., 2019;Wang and Lu, 2020) or general architectures for universal IE (Paolini et al., 2021;Lu et al., 2022;Lou et al., 2023).Graph-based joint IE methods formulate multiple IE tasks as a graph prediction task and aim to capture dependencies between different instances or tasks.Lots of previous works leverage encoder sharing or graph convolutional networks (GCNs) on instance dependency graphs to enrich instance representations (Wadden et al., 2019;Fu et al., 2019;Van Nguyen et al., 2021, 2022a,b).This work is more relevant to some recent works that take efforts on type interactions and global inference.Lin et al. (2020)  Analysis: Sibling factor helps our high-order model find the BZW which is tied for Barclays Bank to be an argument of event Personnel:End-Position triggered by word previously.
#2: The crowd (v1) filled (v2) the street (v3) leading to the Kazimiya mosque in the northeast of Baghdad and carried banners in the green color of Islam, calling for good government.
Analysis: An entity with PER type has less possibility to play an Artifact role.Ternary factor leverages messages passed by node label distributions to refine the edge label which in turn gives the message to refine node labels.
#3: For the most part the marches went off peacefully, but in New York (v1) a small group (v2) of protesters were arrested after they refused to go home (v3) at the end of their rally, police sources said.nealing Search to perform approximate inference.Different from their work, we model both binary and ternary dependencies and leverage MFVI to achieve consistent training and inference.
High-order Methods Previous high-order methods most focus on instance interactions in training process to get more expressive representations, such as sharing representations (Sun et al., 2019;Luan et al., 2019b) or using sequence-to-sequence architecture (Ma et al., 2022;Paolini et al., 2021;Lu et al., 2021).There are some high-order inference methods that are related to us on different NLP tasks.On dependency parsing, Wang and Tu (2020) considered three types of second-order parts of semantic dependencies and approximate decoding with mean-field variational inference or loopy belief propagation.Jia et al. (2022) considered interactions between two arguments of the same predicate on semantic role labeling task.However, due to the complexity, they only did high-order inference on edge existence prediction while leaving label prediction in first-order, and they did not involve heterogeneous factors.In another line of research, Wang andPan (2020, 2021)

Conclusion
In this paper, we propose a novel framework that leverages high-order interactions across different instances and different IE tasks in both training and inference processes.We formulate IE tasks as a unified graph prediction problem, further modeled as a high-order CRF.Our framework consists of an identification module to identify spans as graph nodes and a node/edge labeling module with highorder modeling and inference to jointly label all nodes and edges.

Limitations
The limitation is that we separate node identification and node/edge labeling processes.Because joint node identification and label classification should enumerate all possible spans in a sentence, which is too computationally expensive.Most previous works also separate the two processes.But an obvious disadvantage of such a pipeline scheme is the error propagation problem.We take joint node identification and label classification with high-order inference as future work.
Table 10: Average F1 on ACE05-E dataset with encoders of BERT-large-cased that without the error of the identification module, the performance gap between our baseline and high-order models further increases, and using both sibling factors and ternary factors improves further.

D Ablation Study
We show the experimental results of different factor combinations on Table 10, Table 11 and Table 12.
On Table 12, role-sib represents sib of role pairs, rel-sib represents sib of relation pairs, and r+r-sib represents sib of both role pairs and relation pairs.The hete (+cop), hete (+gp), hete (+cop+gp) are in hete case (i).

E Error Correction Analysis
We take joint EntR and RelE as an example to show the number of error corrections of our high-order model compared to our baseline model in terms of relation types.From Fig. 5, we can find that our high-order model corrects the errors of our baseline model in relation types (the numbers are expected to be positive in the diagonal and to be negative otherwise).

F Re-evaluation of PL-Marker
For the relation extraction task, some corpus have symmetric relations, meaning the ordering of the two entities does not matter (e.g., 'PER-SOC' in ACE2005).A symmetric relation is only annotated in one direction in the annotation data.PL-Marker counts a symmetric relation twice both for prediction number and gold number, but other work only counts once for the prediction and gold numbers.

Figure 2 :
Figure 2: An overview of CRFIE.(A) Model architecture.The identification module provides spans as nodes to the node/edge labeling module.(B) An example factor graph of our node/edge labeling module containing variables representing three nodes and three edges.X i indicates the label variable of the i-th node v i and X ij indicates the label variable of the edge e ij from the v i to v j .The node labels can be event types or entity types (i.e., X i is the abbreviation of X ntask

Figure 3 :
Figure3: Examples showing how our high-order approach improves the graph prediction using different high-order factors.We only display a partial information graph for clearer illustration.
integrate logic rules and neural network to leverage prior knowledge to help relation extraction and event extraction tasks.But they cannot achieve end-to-end training and inference.

Table 1 :
Lin et al. (2020)tity, relation, and event annotations.FollowingLu et al. (2021);Lin et al. (2020); Wadden et al. (2019), we conduct experiments on four English datasets: ACE05-R for EntR and RelE, ACE05-E for EntR and EventE, and ACE05-E+ and ERE-EN for all the three tasks, with the same data pre-processing and train/dev/test split.There are 7 entity types, 6 relation types, 33 event types, and 22 argument roles defined in the ACE2005 corpus.ERE-EN dataset is extracted by combining the data from three datasets for English (i.e., LDC2015E29, LDC2015E68, and LDC2015E78) that are created under Deep Exploration and Filtering of Test (DEFT) program.It includes 7 entity types, 5 relation types, 38 event types, and 20 argument roles.Statistics of all datasets we used are shown in Tabel 1. Datasets statistics (Devlin et al., 2018)s For fair comparison with previous state-of-the-art systems, we use the BERT-large-cased model(Devlin et al., 2018)or

Table 3 :
shows the experimental results.The cases in the table (e.g., homo case (i)) are corresponding to the aforementioned settings in the subsections 2.3 and 2.4.The F1 scores of Tri-I of different settings are the same because they are produced by the same node identification module that is fixed to fairly compare our model in different settings.Average F1 on ACE05-R dataset.Subscript of re-eval means re-evaluation (Appendix F) using the standard evaluation method as other work.* , †, ‡ and ∆ mean T5-large, BERT-large-cased, RoBERTa-large and ALBERT-XXLarg-v1, respectively.PURE s refers to the PURE model with single-sentence features.The results of PURE c and PL-Marker are listed for reference because they use cross-sentence features and are not directly comparable with other models.The reason for GraphIE listed for reference is the same as in Tabel 2.

Table 4 :
Zhang and Ji (2021)-E+ and ERE-EN datasets.*meansT5-large,‡means RoBERTa-large.Others without mark use BERT-large-cased.The reason for reference is the same as in Table2.We do not compare FourIE and GraphIE on ERE-EN dataset because their splittings of train/dev/test are different from ours.The results of previous work on ERE-EN are fromZhang and Ji (2021).

Table 5 :
Comparison of the results of different highorder scoring methods on ACE05-R dataset.

Table 6 :
Comparison of the results of synchronous and asynchronous updating strategies when we use ternary factor on ACE05-E dataset.

Table 7 :
Comparisons of speed (sentences/second) among the baseline and high-order models.
McCarthy was formerly a top civil servant at the Department of Trade and Industry.