Towards Generalized Open Information Extraction

Open Information Extraction (OpenIE) facilitates the open-domain discovery of textual facts. However, the prevailing solutions evaluate OpenIE models on in-domain test sets aside from the training corpus, which certainly violates the initial task principle of domain-independence. In this paper, we propose to advance OpenIE towards a more realistic scenario: generalizing over unseen target domains with different data distributions from the source training domains, termed Generalized OpenIE. For this purpose, we first introduce GLOBE, a large-scale human-annotated multi-domain OpenIE benchmark, to examine the robustness of recent OpenIE models to domain shifts, and the relative performance degradation of up to 70% implies the challenges of generalized OpenIE. Then, we propose DragonIE, which explores a minimalist graph expression of textual fact: directed acyclic graph, to improve the OpenIE generalization. Extensive experiments demonstrate that DragonIE beats the previous methods in both in-domain and out-of-domain settings by as much as 6.0% in F1 score absolutely, but there is still ample room for improvement.


Introduction
Open Information Extraction (OpenIE) aims to mine open-domain facts indicating a semantic relation between a predicate phrase and its arguments from plain text (Etzioni et al., 2008), without fixed relation vocabulary.OpenIE developments have been demonstrated to benefit various domains and applications, such as knowledge base population (Dong et al., 2014), question answering (Fader et al., 2014), and summarization (Fan et al., 2019) Recently, OpenIE has seen remarkable advances.Regarding different strategies for representing open fact, recent techniques with deep neural models can be subsumed under two categories, i.e., sequencebased and graph-based.Sequence-based models * Corresponding author.
predict the facts one by one in an auto-regressive fashion with iterative labeling or generation framework (Cui et al., 2018;Sun et al., 2018;Kolluru et al., 2020a,b), which is the most classical solution in OpenIE.Graph-based method formulates OpenIE as a maximal clique discovery problem based on the span-level text graph (Yu et al., 2021), in which the edge between two spans is defined as the combination of their roles in corresponding fact.To the end, O(m 2 ) edges of O(r 2 ) types are constructed for a fact with m spans of r roles.
Owning to the exquisite design, both sequencebased and graph-based models can identify complicated facts, thus constantly refreshing performance on benchmarks.Nonetheless, it is still unexplored whether these models are sufficient for true open-domain extraction.This doubt comes from that the training and test data in existing Ope-nIE benchmarks are generally independent and identically distributed, i.e., drawn from the same domain (Stanovsky et al., 2018;Sun et al., 2018;Gashteovski et al., 2019).However, this assumption does not hold in practice.Built on domainindependence (Niklaus et al., 2018), OpenIE models have to process diverse text, it is common to observe domain shifts among training and test data in applications.Therefore, the performance on indomain benchmarks may not exactly measure the generalization of out-of-domain extraction.
Starting from this concern, we carry out extensive experiments to investigate whether state-of-theart OpenIE models preserve good performance on unseen target domains.To provide a reliable benchmark, we publicize the first Generalized OpenIE dataset containing 110,122 open facts annotated humanly on 20,899 sentences collected from 6 completely different domains.We find out that, there are some noticeable semantic differences between open facts in different domains, posing challenges to the generalization of OpenIE models.Because of domain shifts, in sequence-based models, the accuracy in each step prediction declines significantly, and the early errors are magnified later.Similarly, in the graph-based model, the reduced edge prediction ability struggles to accurately connect O(m 2 ) edges of O(r 2 ) types especially when the span number m and role number r are both no small in complicated facts.As a result, their F1 scores degrade as much as 70% relatively (from 43% to 13%) when applied to unfamiliar domains, thus cannot work well in real-world extraction.
The above observations demonstrate full-fledged open-domain extraction still has a long way to go, and suggest a way for a more generalized OpenIE model: we should reduce the extraction complexity to lower the potential risk of prediction errors in domain shifts.This is essentially the Occam's Razor principle (Rasmussen and Ghahramani, 2000): among all functions which fit the training data well, simpler functions are expected to generalize better.Therefore, we explore a minimalist expression of open fact: by sequentially connecting the boundary positions of all spans in the fact with their order in the text, each open fact can be simply modeled as a directed acyclic graph.Then OpenIE is equivalent to predicting the graph adjacency matrix and decoding facts from the directed graph.This idea leverages the sequential priors to reduce the complexity of function space (edge number and type) in the previous graph-based model from quadratic to linear, while avoiding auto-regressive extraction in sequence-based models, thus improving generalization.We implement it in DragonIE, a Directed acyclic graph based open Information Extractor.
We perform extensive in-domain and out-ofdomain experiments for OpenIE.On the previous commonly used in-domain evaluation, DragonIE outperforms the state-of-the-art method, with substantial gains of up to 3.6% average F1 score, 3x speedup, and 5x convergence.Meantime, it reduces the number of edges by 66% and the number of edge types by 88% compared with the previous graph-based method.On our newly proposed outof-domain benchmark, DragonIE further improves the performance gap to 6.0%, and still exceeds the previous methods with only 10% training data, showing better generalization.Detailed analysis shows that DragonIE can effectively represent overlapping, nested, discontinuous, and multiple facts despite its simplicity.We also perform a qualitative analysis that summarizes typical extraction errors and outlines the future directions.

Pilot Experiment
To quantitatively evaluate the robustness of Ope-nIE model against domain shifts, we first propose a standard evaluation setup for generalized Ope-nIE.Then, we conduct pilot experiments as well as empirical analyses in this section.

Generalized OpenIE Evaluation Setup
Given a sentence, OpenIE aims to output a set of facts in the form of (subject, predicate, object 1 , • • • , object n ), and all of them are stated explicitly in the text (Yu et al., 2021).As shown in

Result Analysis
We select the best-performing sequence model IGL-OIE (Kolluru et al., 2020a), and graph model MacroIE (Yu et al., 2021), for our pilot experiments.The evaluation metric is gestalt F1 score (Yu et al.,  2021).Note that there are ore datasets and metrics in the main experiments (Section 4).
Figure 1 shows a detailed comparison across different domains and models on GLOBE.From the results we can see that: compared with the performances on SAOKE under in-domain setting, both the sequence-based and graph-based models encounter great performance drops on out-of-domain GLOBE, with a relative decline of 35%-70% in F1 score.This indicates that the robustness of OpenIE model may be challenged in cross-domain generalization.Intuitively, there are obvious differences in the topic and style of texts in different domains.For example, in the medical domain, subject and object are usually rare biological terminology, which is less covered in the limited general-domain training data.Such a semantic shift degrades the prediction ability of the model fitted to the training set.
Exacerbating this issue further, modern OpenIE models often contain multiple prediction steps.Under domain shifts, every step is likely to go wrong, resulting in a collapse in the overall performance.Specifically, sequence-based models predict facts auto-regressively, an mispredicted fact will directly affect the extraction of all the following facts.The graph-based model requires O(m 2 ) edges of O(r 2 ) types for a fact with m spans of r roles.In GLOBE, the built graph contains an average of 28.5 edges with a total of 176 edge types for each open fact, and the wrong prediction of any edge may lead to the overall failure.Thus, these methods are vulnerable to out-of-domain generalization.

Methodology
From the above observations, we know that recent OpenIE models are too complex to generalize.In this section, we propose a simplified expression of open fact: directed acyclic graph.We start with the motivation of our new graph structure, then go through the implementation details.Moreover, benefiting from the directed edge, we can assign the role of one connected span as the edge type, and recursively obtain the roles of all spans, thus greatly simplifying the edge type space from O(r 2 ) to O(r).Meanwhile, the edges can be predicted in parallel, thus solving the cascade error in previous auto-regressive models.

Directed Acyclic Graph
The above operation actually converts each input text to a directed acyclic graph (DAG).In graph theory, a DAG consists of vertices and edges, with each edge directed from one vertex to another, such that following those directions will never form a closed loop.DAG can be topologically ordered, by arranging the vertices as a linear ordering with the edge directions.This feature is consistent with what we want to combine span in the order that it appears in the text.If we treat each continuous span involved in one fact asserted by the input text as a vertex in DAG, and connect oriented edges, from one vertex to another one that later appears in the text and belongs to the same fact.Then in the simple case shown in Figure 2, each directed path from root to leaf vertex represents an open fact.Thus, for spans with a single word, such as As, there will be two vertexes refer to the beginning and ending words.
Unfortunately, such an elegant paradigm is not suitable for all scenarios.When dealing with some complex cases like Figure 3, it encounters the following challenges: (1) The granularity of text is word, while the granularity of open fact is span, so it is necessary to predict not only the relations between spans but also what is a span in the fact; (2) Different spans may be overlapping and share some words, as the span of America is enclosed in another span leadership of America in the case of Figure 3. (3) Different facts may be overlapping and share some fact elements (either subject, predicate or object).For example, Biden acts as the subject in all the three facts and is not the root vertex.Therefore, we cannot simply assume that each path in the DAG represents an open fact.
DAG Construction.These challenges prompt us to design the following three types of edges to avoid ambiguous extraction: (1) intra-span edge: it connects the beginning and ending words of a span with a I tag.(2) inter-span edge: it connects the Ending word of a span and the Beginning/Ending word of the next span in the fact with a EB-X/EE tag, respectively, where X represents the role of the next span.Intuitively, each span can be uniquely identified by its two boundary words, and the double inter-span edge design helps distinguish overlapping spans.If we only connect the ending words of two spans, such as the and America, we cannot determine whether the subsequent span of the is leadership of America or of America, because they have the same ending word, and it is the same with just using the EB-X tag.(3) intra-fact edge: it connects the Beginning word of the first span and the Ending word of the last span in a fact with a BE-X tag to delimit the boundary of a fact.In this way, even for overlapping facts, we can accurately judge the range of each fact within DAG.Because only the role of the subsequent span is indicated in the inter-span edge, the role of the first span in the fact is unknown, so we specify it in BE-X.
DAG Decoding.With the edge definition above, we first find all BE-X edges to determine the beginning and ending words of target facts, and then traverse all paths between them, in which each path represents a fact.During decoding each path, all the I edges are utilized to determine the spans in the path, then we can judge the role of each span according to the EB-X edge and distinguish overlapping spans with the EE edge.Finally, spans in each path are combined according to their roles to output structured facts.Besides, DAG can naturally identify discontinuous facts, where each element in open fact may contain multiple spans.we can splice the spans of the same role in the order of the text to get the discontinuous element.In Section 5.2, we empirically conclude that our constructed DAG has been a minimalist expression of open fact: arbitrarily removing any edge will reduce the representation ability.The Occam's Razor principle has stated that among all functions that have a good training set fit, the simplest one is likely to generalize better.Thus DAG is expected to have great generalization in OpenIE.

Architecture
Therefore, OpenIE is transformed into how to build a desired DAG.To this end, we propose DragonIE, a Directed acyclic graph based open Information Extractor.Intuitively, the edges defined in DAG depict the relation between words in the text, so DragonIE enumerates all word pairs and makes parallel prediction1 : It first maps each word w i into a d-dimensional contextual vector h i ∈ R d with a basic encoder such as BERT (Devlin et al., 2019).Then each (h i ,h j ) is fed to a pairwise score function, followed by a Sigmoid layer to yield the probability of each edge type p i,j ∈ R c (Wang et al., 2020(Wang et al., , 2021)).
During training, we optimize the parameters θ of DragonIE to minimize the cross-entropy loss: where is the predicted probability of (w i , w j ) along the k-th edge type, and y i,j [k] ∈ {0, 1} is ground truth.At inference, a threshold δ tuned on the dev set is applied to filter low confidence prediction and get the final edge labels.
4 Experimental Setup

Datasets
In our experiments, we evaluate the models on three datasets.( 1 (Kolluru et al., 2020b).However, OpenIE4 is automatically-derived with great data noise, and the annotation scheme is inconsistent with CarB, so the results on CarB are relatively unreliable.

Implementation Details
We implement DragonIE by initializing the encoder parameters from BERT for English (Devlin et al., 2019) and Chinese (Cui et al., 2020).DragonIE is optimized by BertAdam with a maximum sequence length of 200, an epoch number of 30, and a learning rate of 1e-5.The threshold δ is selected from [0.2, 0.4].We select the model with best performance on validation set to output results on test set.Hyper-parameters are selected based on the validation set, and all experiments are conducted on a single Tesla V100 GPU.

Experimental Results
Our experiments aim to answer three questions: Q1 How does DragonIE compare to other methods in both in-domain and out-of-domain settings?Q2 Does DragonIE effectively handle complex extraction scenarios despite its simplicity?Q3 What causes the performance gap between outof-domain and in-domain OpenIE?
5.1 Overall Performance (Q1) Another advantage of simpler design is faster convergence and inference speed.As shown in Table 5, with the same hyper-parameters, Drago-nIE achieves the best results in 4 epochs, while MacroIE requires 20 epochs to reach the peak.Moreover, DragonIE accelerates the testing time by 3 times.While the decoding of MacroIE needs a time-consuming maximal clique discovery algorithm like Bron-Kerbosch (Bron and Kerbosch, 1973), whose time complexity is O(3 n/3 ) for an n-vertex graph.DragonIE avoids this issue, thus obtaining large speed improvement.

Detailed Analysis (Q2)
A potential concern is whether the better generalization of the simple DAG-based OpenIE formulation is at the expense of extracting complex facts, as simplicity usually leads to a reduction in representation capability.To answer this question, we perform a fine-grained evaluation on GLOBE.(1) We select the sentences containing discontinuous or overlapping or nested facts from GLOBE to form three complex test sets.Here discontinuous means that at least one fact element in the sentence is not a continuous span, overlapping means that multiple facts in the sentence share at least one element, while nested means that different elements share some common spans.These three patterns are the most common complex facts in OpenIE, and their distribution is detailed in Appendix A.1.( 2  In addition, we conduct a set of ablation tests on the graph to verify that our DAG is already a minimalist expression of open fact.Table 6 shows that: (i) when only connecting the ending word of one span and the beginning word of the next span (EB-X) and removing the edge connected with the ending word of the next span (EE), the F1 score drops by 1.6% in average since it cannot accurately represent nested facts, as demonstrated in Section 3.2; (II) Removing the intra-fact edge and treating each path from the root vertex to the leaf vertex on the DAG as a fact hurts the results by 3.5 F1 pts in average, which is difficult to extract overlapping facts; (III) Marking the role of the next span on edge instead of the combination of two-span roles brings a remarkable improvement (2.0% averagely), since it effectively compresses the edge type space from O(r 2 ) to O(r).Note that the intra-span edges cannot be ablated because they recognize spans.On the whole, each edge in our built DAG is indispensable.Table 7: Error analysis of DragonIE.We report the number of false facts belonging to five major error classes on the analysis set (containing 100 gold facts) of in-domain and out-of-domain benchmarks.

Qualitative Evaluation (Q3)
Although DragonIE achieves state-of-the-art results in all the benchmarks, there are still substantial differences between the out-of-domain and in-domain performance.We compare the mistakes made by DragonIE with two analysis sets that sample from the test set of GLOBE and SAOKE, respectively, and summarize the error types.The sampling strategy requires that the sentences in the analysis set contain 100 gold open facts.Table 7 reports five major error classes and the number of corresponding false facts on the two benchmarks.
Wrong Boundary is a too large or too small boundary for an element in an open fact.Wrong Extraction describes an open fact that does not hold in the original sentence.They are the least common error types in both settings, showing that our model can identify the correct span and fact across domains.It would be interesting to see if introducing causal inference (Nan et al., 2021), or mutual information maximization (Zhang et al., 2020) to strengthen the correlation between facts and sentences, can improve the performance.Uninformative Extraction is widely present in the output of various domains, it usually does not provide information gain.We think a promising improvement direction is applying an additional postprocessing model to judge the informativeness of each open fact.Incomplete Extraction omits critical information resulting in unclear fact seman-tics.Missing Extraction is an outcome where the model fails to predict the open fact.According to statistics from Table 7, these two types of errors are the root cause of the performance gap between in-domain and out-of-domain settings.We believe the following research directions are worth following for them: (1) Pre-training models on a massive corpus with OpenIE-oriented self-supervised tasks to sufficiently capture domain-robust OpenIE exclusive features (Lu et al., 2022); (2) Leveraging the domain generalization techniques to learn the invariances across domains, i.g., meta learning (Li et al., 2018a;Geng et al., 2019;Zhao et al., 2022), adversarial learning (Li et al., 2018b), and contrastive learning (Kim et al., 2021).

Related Work
OpenIE.From rule-based systems and statistical methods (Fader et al., 2011;Corro and Gemulla, 2013;Gashteovski et al., 2017), to neural models (Cui et al., 2018;Stanovsky et al., 2018;Roy et al., 2019), OpenIE research has experienced three technological evolutions in the past decade.Each evolution brings a more expressive architecture, and meantime requiring much more training data.To this day, the best-performing OpenIE model either predicts open facts in the sentence auto-regressively (Kolluru et al., 2020a,b), or represents each open fact as a maximal clique on the graph with quadratic edge numbers and types (Yu et al., 2021).Such trends pose two potentially challenges: (1) The popular evaluation protocol mainly operates with the i.i.d.assumption, i.e., the training domain is the same as the test domain (Stanovsky et al., 2018;Sun et al., 2018;Gashteovski et al., 2019;Yu et al., 2020;Zhang et al., 2022), which is contrary to the domain-independent discovery objective of OpenIE (Niklaus et al., 2018).Although the existing studies have achieved surprising performance under i.i.d.evaluation, their generalization for true open extraction has not been evaluated.Some works try to use OpenIE4 (Kolluru et al., 2020b) to train the model and verify it on CarB (Bhardwaj et al., 2019), but the noise annotation of OpenIE4 and the different annotation standards of the two datasets make the evaluation results unreliable.(2) As revealed by our preliminary experiments, recent OpenIE models always encounter great performance drops in the out-ofdomain setting.Their complex auto-regressive prediction process and graph structure may overfit the training data specifics, resulting in unsatisfactory cross-domain generalization.In this paper, we present the first systematic study to examine how robust OpenIE methods are when trained and tested on different datasets (domains), and further propose a minimalist expression of open fact to implicitly improve the generalization behavior.Domain Generalization.The main goal of domain generalization is to learn a domain-invariant representation from multiple source domains so that a model can generalize well across unseen target domains (Kim et al., 2021;Mi et al., 2021).Recent advances mainly focus on three aspects: data augmentation, model design, and robust training.Augmenting the dataset with transformations such as mix-up (Zhang et al., 2021) improves generalization (Pandey et al., 2021).A simplified model design mines the task essence to resist domain shifts (Ghosh and Motani, 2021).Robust training methods hope to optimize a shared feature space,i.e,by minimizing maximum mean discrepancy (Tzeng et al., 2014), transformed feature distribution distance (Muandet et al., 2013), or covariances (Sun and Saenko, 2016).This paper primarily explores generalized OpenIE from the perspective of model design.How to combine data augmentation and robust training to further improve the generalization will be our future work.

Conclusion
In this paper, we lay out and study generalized Ope-nIE for the first time.We release GLOBE, a largescale, high-quality, multi-domain benchmark with 110,122 open facts, to evaluate the generalization of OpenIE models.Furthermore, we explore the minimalist graph expression of open fact: directed acyclic graph, to reduce the extraction complexity and improve the generalization behavior.Experimental results show that our proposed method outperforms state-of-the-art baselines in both indomain and out-of-domain settings.This work is a starting point towards building more practical Ope-nIE models with stronger generalization, and we also present fine-grained analyses which point out promising avenues for further improvement.

Limitations
While this work has made some progress towards generalized OpenIE, it still has some limitations.First, to produce a complete training-test evaluation setup with the largest human-annotated OpenIE dataset SAOKE, our annotated GLOBE benchmark is in Chinese.We speculate that the same conclusions can be observed in other languages, and leave this for future work.Second, although the proposed DragonIE method greatly exceeds the baselines, there is still a significant performance degradation under the out-of-domain setting compared with the in-domain setting.We will continue to work to narrow the performance gap.

A.2 Dataset Statistics
The final GLOBE dataset consists 110,122 open facts annotated on 20,899 sentences spanning 6 distinct domains, making it the largest and most diverse human-annotated OpenIE test set.This new dataset allows us to quantify the OpenIE performance in various downstream applications, and to better understand the limits of generalization exhibited by the most recent OpenIE methodology.
Table 8 shows the number and proportion of sentences belonging to different domains.It can be found that there are at least 2k sentences in each domain, so the performance of OpenIE model can be fully measured.We count the number of sentences in the data set that contains at least one complicated fact, as shown in Table 9.Here discontinuous means that at least one fact element in the sentence is not a continuous span, overlapping means that multiple facts in the sentence share at least one element, while nested means that different elements share some common spans.It can be seen that identifying the discontinuous, overlapping, and nested facts is very important for OpenIE, because the sentences containing complicated facts account for 95.6% in GLOBE.We also report the fact number distribution in

B Detailed Experiments B.1 Detailed Evaluation metrics
We report performance values computed by the three most widely adopted metrics in the OpenIE literature.: (1) CaRB-single considers the number of common words in (gold, predicted) pair for each argument of the fact by greedily matching gold with one of the predicted facts; (2) CaRB-multi allows a gold fact to be matched to multiple predicted ones, thus more relaxed than CaRB-single; (3) Gestalt converts each fact into a string and uses the Gestalt function to measure the string similarity of (gold, predicted) pair.Therefore, it requires not only the coincidence of tokens, but also the consistency of token order, thus being the most stringent metric.

B.2 Detailed Performance Comparison
Table 11-16 summarize the detailed results in 6 domains of the GLOBE dataset.DragonIE has significantly exceeded the baseline model in 54 evaluation metrics of 6 domains, which once again proves the effectiveness of our method.It is worth noting that there are great differences in the extraction performance in different domains, the highest F1 score of DragonIE is only 33.6%, indicating that there is still much room for improvement toward practical out-of-domain applications.

B.3 Detailed Analysis on SAOKE
Similar with the detailed analysis conducted on GLOBE in the main experiment, we also perform a fine-grained evaluation on SAOKE.(1) We select the sentences containing discontinuous or overlapping, or nested facts from SAOKE to form three complex test sets.

B.4 Deatiled Analysis on Edge Type Space
In Table 2 For MacroIE, different spans belonging to the same fact are connected to each other, by linking the beginning position and ending position of two spans, that is, there are 4 position types {B2B, B2E, E2B, E2E}.There is also a NEXT edge between adjacent spans belonging to the same kind of element to indicate the original order of spans.Therefore, a total of (6 × 6 + 1) × 4 = 148 edge types are required to represent the relations between 6 kinds of spans.In addition, SAOKE also defines 7 virtual predicates{ =, BIRTH, DEATH, NOT, DESC, ISA, IN}, which do not appear in the text.It is necessary to set virtual nodes for them and connect them to the boundary tokens of other elements in the fact.Therefore, 7 × 4 = 28 edge types are also required.So MacroIE needs 148 + 28 = 176 kinds of edges.
For DragonIE, it needs to set up a EB type edge and a BE type edge for each role, as well as a EE edge and a I edge.To identify the virtual predicate, DragonIE connects the object to the virtual predicate node, so there are 7 additional edges.So DragonIE needs 2 × 6 + 2 + 7 = 21 kinds of edges.

Figure 1 :
Figure 1: Gestalt F1 score comparasion on six out-ofdomain test sets and the original in-domain test set.

Figure 2 :
Figure 2: An example of representing open facts as an undirected maximal clique or a directed acyclic graph.

3. 1
MotivationHow to properly model open fact is the most important problem in OpenIE system design.The previous graph-based model treats spans belonging to one open fact as an undirected clique such that spans are pairwise connected with a combination of their roles as the edge type.Whereas, as shown in Figure2, there is actually a natural reading order from left to right between spans in the text.Such sequential prior means we can simply connect the edges between adjacent spans in the text to determine open facts.In this way, the model no longer has to identify the pairwise relation between each span pair, which lessens the learning burden by reducing the edge numbers from O(m 2 ) to O(m).

Figure 3 :
Figure 3: An overview of DargonIE.When building DAG, it enumerates each word pair and predict their edges.Thus, for spans with a single word, such as As, there will be two vertexes refer to the beginning and ending words.

Figure 4 :
Figure 4: Gestalt F1 scores on (a) complicated extraction, (b) multiple extraction, and (c) low-resource extraction.All the analyses are conducted on GLOBE.We also report the comparison results on SAOKE in Appendix B.3.

Figure 5 :
Figure 5: Gestalt F1 scores on (a) complicated extraction, (b) multiple extraction, and (c) low-resource extraction.All the analyses are conducted on the SAOKE test set.
(2) We validate DragonIE's capability in extracting different numbers of open facts by splitting the sentences into five classes according to the fact count.(3) We conduct low-resource experiments on five different partitions of the original SAOKE training sets (1/10/30/50/70%).As presented in Figure 4, DragonIE again attains gains in all classes across three settings, consistent with the observation on GLOBE.

Table 1 :
Comparison of representative OpenIE datasets.Human means the dataset is human-annotated rather than model-derived or converted from other corpus.Shift denotes the dataset supports the evaluation of Ope-nIE generalization performance with domain shift.

Table 2 :
In-domain Evaluation: Main results on the in-domain benchmark SAOKE.

Table 3 :
Out-of-domain Evaluation: Main results on the out-of-domain benchmark GLOBE.

Table 4 :
Out-of-domain Evaluation: Main results on CaRB.The models are trained on the noisy OpenIE4 dataset.

Table 5 :
Comparison in convergence and testing time on SAOKE, measured in epochs and seconds respectively.

Table 6 :
Ablation study of DragonIE.Numbers denote the corresponding Gestalt F1 scores.

Table 8 :
The number and proportion of sentences belonging to different domains in GLOBE.

Table 9 :
The number and proportion of sentences containing complicated facts in GLOBE.

Table 10 :
The number and proportion of sentences containing different number of facts in GLOBE.

Table 10
, we list the edge type sets of MacroIE and DragonIE on SAOKE (also GLOBE).MacroIE needs 176 edge types, while DragonIE has only 21 edge types, reducing the edge types by 88%.Next, let's analyze the reasons carefully.Theoretically, MacroIE needs O(r 2 ) edge types, while DragonIE is O(r), r represents the number of possible role types in open facts.There are 6 roles in SAOKE: {subject, predicate, object, time, place, qualifier}.