Syntactic Multi-view Learning for Open Information Extraction

Open Information Extraction (OpenIE) aims to extract relational tuples from open-domain sentences. Traditional rule-based or statistical models were developed based on syntactic structure of sentence, identified by syntactic parsers. However, previous neural OpenIE models under-explored the useful syntactic information. In this paper, we model both constituency and dependency trees into word-level graphs, and enable neural OpenIE to learn from the syntactic structures. To better fuse heterogeneous information from the two graphs, we adopt multi-view learning to capture multiple relationships from them. Finally, the finetuned constituency and dependency representations are aggregated with sentential semantic representations for tuple generation. Experiments show that both constituency and dependency information, and the multi-view learning are effective.


Introduction
Open Information Extraction (OpenIE) aims to generate structured tuples from unstructured open-domain text (Yates et al., 2007).The extracted tuples are in the form of ⟨Subject, Relation, Object⟩ for binary relation, and ⟨ARG 0 , Relation, ARG 1 , . . ., ARG n ⟩ for nary relation.It has been a critical NLP task as it is domain-independent and does not rely on predefined ontology schema.The structured relational tuples are beneficial to many downstream tasks such as question answering (Khot et al., 2017), knowledge base population (Martínez-Rodríguez et al., 2018;Gashteovski et al., 2020) and word embedding generation (Stanovsky et al., 2015).
In general, traditional OpenIE systems are either statistical or rule-based.They extract relational tuples mainly based on certain sentence patterns heuristically defined on syntactic structures.The   limitation of patterns causes traditional OpenIE systems to be ineffective in handling complex sentences.Recently, neural OpenIE systems have been developed and showed promising results.Neural OpenIE systems no longer depend on pre-defined patterns.Instead, they learn to extract relational tuples directly from unstructured text in an end-toend manner.However, utilizing syntactic information is under-explored among the neural OpenIE systems, although syntactic information is widely explored in other Information Extraction tasks such as Semantic Role Labeling (SRL) (Fei et al., 2021) and Relation Extraction (RE) (Zhang et al., 2018).
As syntactic information was proved to be useful for the traditional OpenIE systems, we argue that it is important for neural OpenIE systems as well.Figure 1a shows an n-ary tuple expected from an example sentence.Figure 1b displays the sentence's constituency tree, where the phrases are labeled with constituency tags.The dependency tree (shown in Figure 1c) represents syntax through directed and typed edges between words, instead of between phrases.We observe that the boundary of the OpenIE tuple (shown in Figure 1a) highly coincides with edges from constituency and dependency trees.As such, we believe both constituency and dependency trees provide useful and complementary syntactic information for OpenIE systems.The next question is: how to effectively incorporate syntactic information to neural OpenIE models?
To fully explore the syntactic information from both parse trees, we convert them into 'nodesharing' graphs, i.e., constituency graph (denoted as const-graph) and dependency graph (denoted as dep-graph), respectively.A dependency relation specifies a relationship between two words, and we can represent words as nodes and the relationship as corresponding edge.The key challenge is to map constituency tree to a graph in which all of its nodes are words, not phrases.Meanwhile, the graph needs to largely capture the constituency syntactic information.In this work, we present a novel method for the conversion of constituency trees.With the converted const-graph modelled at the word level, we can now easily integrate it with dep-graph.Moreover, both graphs can be directly integrated with Pre-trained Language Model (PLM) which provides word-level representation.
In order to leverage heterogeneous syntactic information from both const-graph and dep-graph, we propose a novel neural OpenIE model: SMiLe-OIE (Syntactic Multi-view Learning for Open Information Extraction).It first encodes a sentence using BERT (Devlin et al., 2019), and subsequently uses two syntactic encoders, namely Constencoder and Dep-encoder.The model represents constituency and dependency relations of the sentence with the corresponding BERT representations and applies two Graph Convolutional Networks (GCN) (denoted as Const-GCN and Dep-GCN) to learn graph representations for const-graph and dep-graph separately.The representations from BERT, Const-GCN, and Dep-GCN are aggregated and finally used for tuple generation.To better fuse the heterogeneous syntactic graph representations, SMiLe-OIE introduces a subtask, multi-view learning, to learn multiple types of relationships among const-graph and dep-graph.The multi-view learning loss is used to finetune the graph representations along with OpenIE loss.In summary, our contributions are threefold:  (Fader et al., 2011), R2A2 (Fader et al., 2011), OLLIE (Mausam et al., 2012), Clausie (Corro and Gemulla, 2013), Stanford OpenIE (Angeli et al., 2015), Ope-nIE4 (Mausam, 2016), NESTIE (Bhutani et al., 2016), and MINIE (Gashteovski et al., 2017).Most of these models extract relational tuples based on syntactic structures such as part-of-speech (POS) tags and dependency trees.In this sense, syntactic information has been essential to OpenIE.Recently, neural OpenIE systems (Cui et al., 2018;Stanovsky et al., 2018;Roy et al., 2019;Kolluru et al., 2020a;Dong et al., 2021;Vasilkovsky et al., 2022;Kotnis et al., 2022) have been developed and showed promising results.Neural Ope-nIE systems are able to extract relational tuples end-to-end based on the semantic encoding of input sentence.The analysis of syntactic structure of sentence, which was required by traditional models, seems no longer necessary.As a result, the usage of syntactic information is under-explored in neural OpenIE models.Nevertheless, there exist some neural OpenIE systems that utilize some forms of syntactic information.For example, Rn-nOIE (Stanovsky et al., 2018) projects POS tag of each word into POS embedding and concatenates it with word embedding as input to sentence encoder.SenseOIE (Roy et al., 2019) further concatenates word embedding with dependency embedding.CIGL-OIE (Kolluru et al., 2020b) finds all head verbs in the sentence and pre-defines a few POS patterns to explicitly constrain the model training.MGD-GNN (Lyu et al., 2021) connects words, if they are in dependency relations, in an undirected graph and applies graph attention network (GAT) to the graph (Veličković et al., 2018) as its graph encoder.Although MGD-GNN uses some graphic information of dependency, it loses other information like the directness and types of dependency relations.
In short, we observe that existing neural Ope-nIE systems fail to explore some syntactic features and that the integration of syntax is in a shallow manner.Compared to the them, our SMiLe-OIE is able to leverage full features of heterogeneous syntactic information from both constituency and dependency trees.
Integration of Constituency and Dependency Syntax.Although constituency and dependency trees possess common sentential syntactic information, they capture syntactic information from different perspectives.Recent NLP tasks have benefited from integrating these two syntactic representations.Zhou and Zhao (2019) and Strzyz et al. (2019) integrate dependency and constituency syntactic information as a representation of parse tree or sequence, but not of a graph.To the best of our knowledge, HeSyFu (Fei et al., 2021) is the only work that converts dependency and constituency trees into graphs and performs graph learning strategy on both.In this sense, although HeSyFu is designed for SRL task, it is the most relevant model to ours.
SMiLe-OIE differs from HeSyFu mainly in three perspectives: (1) HeSyFu models constituency tree at phrase level, which is inconsistent with word-level representations from BERT.Meanwhile, HeSyFu models dependency tree at word level, so the constituency representations cannot be directly fused with the dependency representations.(2) As a result, HeSyFu integrates the BERT representations and the two parse trees' representations with complicated bridging processes, which may hinder synergistic integration of the heterogeneous representations.(3) To better fuse heterogeneous syntactic information, SMiLe-OIE adopts multi-view learning to capture multiple relationships between const-graph and dep-graph representations.
Multi-view Learning Multi-view learning aims to learn representations or features from the multiview data.Generally, data from different views usually contain complementary information.Therefore, multi-view learning is able to exploit such complementary information to learn more com- prehensive representations than those of singleview learning methods (Li et al., 2019).In the modern era, multi-view data have increased voluminously, leading to more attention of multiview learning mechanism.The studies of multiview learning (Yan et al., 2021) mainly fall into: multi-view fusion (Zhao et al., 2017;Sun, 2013), multi-modal learning (Ramachandram and Taylor, 2017;Baltrušaitis et al., 2019), multi-view clustering (Chao et al., 2021), and multi-view representation learning (Li et al., 2019;Guo et al., 2019;Ata et al., 2021).In our work, we perform multiview learning and fusion on two views of syntactic graphs ( i.e., const-graph and dep-graph).To the best of our knowledge, we are the first to use multi-view mechanism to exploit complementary syntactic information in NLP applications.

Graph Modelling
In this section, we elaborate on our graph modelling strategy to convert constituency and dependency trees into graphs G = (U, E), where U indicates the set of nodes and E the set of edges.They are G con = (U con , E con ) for const-graph, and G dep = (U dep , E dep ) for dep-graph.The two graphs' nodes correspond to the same set of input sentence's words.However, the labels of the nodes in the two graphs are different (constituency path for const-graph, and dependency relation type for dep-graph), preserving the syntactic information of constituency and dependency trees.Also, their edge sets are different, where the edges represent word-to-word syntactic relations.

Dependency Graph Modelling
Dependency tree provides syntactic dependency at word level.Thus, the dep-graph of a sentence is identical to the dependency tree of the sentence, except node labels.For each word, as shown in Figure 1c, there is an inbound relation from its  modifying head word.We follow Fei et al. (2021) to label each word node with its inbound dependency relation type, as exemplified in Figure 2.

Constituency Graph Modelling
We flatten the phrase-level relations of a constituent structure into a const-graph of word-level relations, which can be directly integrated with wordlevel granularity from Pre-trained Language Model (PLM) such as BERT.The flattening process is designed to preserve both the phrasal boundary information and the constituency relations which are required for OpenIE task.But note that this flattening process can be used for other related modelling tasks (e.g., SRL, NER, RE).
Word Node Labelling with Constituency Path.In const-graph, each word is a node, and we label each word node with the path from the root to the word in the constituency structure of the input sentence.Table 1 lists the constituency paths of words in the example sentence in Figure 1.This labelling of words with constituency paths preserves the rich phrasal information of constituency tree.
Word-level Constituency Relations.In constgraph, edges are constituency relations that connect word nodes.We perform relation flattening of the constituency tree in Figure 1b in the following steps: (1) We add an edge between the first and last word in each noun phrase (NP) (e.g., 'Mary'-''s', 'Mary'-'cat', 'plush'-'toys', 'the'-'room').The edge is labelled as 'NP'.This edge identifies the boundary of NP. (2) If a word and a phrase are siblings (belonging to the same parent node in constituency tree), we connect the word (e.g., Verb in VP, Preposition in PP) to the first word of its  3 and hierarchical constituency paths in Table 1.
(3) To mark the boundary of intra-sentential clause, we connect the first and last word of each clause (S) and label the edge as 'S' (e.g., 'playing'-'room').(4) We remove an edge when the distance between two words is longer than 8 in the input sentence, since the elements in OpenIE tuple are usually short spans.
Figure 3 depicts the word-level relations flattened from the constituency tree in Figure 1b, and Figure 4 depicts the final const-graph.

SMiLe-OIE Model
The overall architecture of SMiLe-OIE is illustrated in Figure 5. SMiLe-OIE is based on BERT encoder to get contextualized representations of an input sentence.The BERT representations are then integrated with constituency and dependency information by Const-GCN and Dep-GCN, respectively.Finally, SMiLe-OIE aggregates the BERT, const-graph, and dep-graph representations in order to predict output tuples.Beyond the OpenIE tagging loss, SMiLe-OIE further performs multiview learning on the const-graph and dep-graph representations, and generates additional multi-view losses to enhance OpenIE tagging accuracy.

Task Formulation
We formulate OpenIE as a sequence tagging task, using BIO (Beginning, Inside, Outside) tagging scheme like recent neural OpenIE models (Stanovsky et al., 2018).Given an input sentence s = [t 0 , . . ., t n ], a variable number of tuples will be extracted.Each tuple can be represented as [x 0 , . . ., x m ], where x j is a contiguous subspan of We assume that each tuple has a verb as relation, since OpenIE relation is typically associated with a verb 2 .Meanwhile, since some verbs in a sentence do not lead to any relational tuple, we assume one verb can be a relation for at most one tuple.

BERT Encoder with Relation Indicator
We employ BERT (Devlin et al., 2019) as our encoder to analyze semantic interactions among words.We first project all words [t 0 , . . ., t n ] into embedding space by summing their word embedding 3 and verb embedding, i.e., w i = W word (t i ) + W verb (t i ).Here, W word is trainable and initialized by BERT word embedding.
W verb is a trainable verb embedding matrix.Verb embedding is to distinguish whether an input word is a relation indicator or not.Given an input sentence, we extract all verbs from the sentence using an off-the-shelf POS tagger.We consider each verb in a sentence to be a potential relation indicator and use the verb embedding to highlight this relation indicator.Specifically, W verb initializes each verb to 1 at a time, and all the other words in the sentence to 0. If a verb in the sentence does not lead to a tuple, we set all of the sequence output tags to be "O".Consequently, the model is able to 2 Relation can be referred as predicate, and verb can be referred as predicate head word in other OpenIE works.
3 If the word contains multiple sub-words after BERT tokenization, we use the representation of its first sub-word.learn which verbs lead to relation. 4hen, we use w s = [w 0 , . . ., w n ] as the input to the BERT encoder and utilize BERT's last hidden states as contextualized representations: (1)

Syntactic GCN Encoders
In this section, we present syntactic encoders, which represent the elements of the two graphs with the BERT representations and encode them using GCNs.Recall that the dep-graph and constgraph are represented as G z = (U z , E z ), where z ∈ {dep, con}.e z ij in E z equals to 1 if there is an edge between node n z i and node n z j ; Otherwise, 0. Each node n z i ∈ U z has a label (or type), designated as type ⟨n z i ⟩.The node types of U dep are dependency relations.The syntactic encoder of G dep (called Dep-GCN) takes node embedding as follows: where W 1 dep ∈ R d l ×N dep is a trainable matrix, d l is the size of node embeddings, and N dep is the total number of unique dependency relations.
A node n con i ∈ U con has a label of constituency path type ], which contains a list of constituent tags.Given a constituency path, we first project all its constituent tags to respective constituent tag vectors, and then average all the constituent tag vectors in this constituency path as inputs to the syntactic encoder of G con (called Const-GCN) as follows: ×Ncon is a trainable matrix, and N con is the total number of unique constituency tags.avg() indicates the averaging operation on a sequence of vectors.
Each syntactic encoder (Dep-GCN, Const-GCN) employs a separate GCN to encode the corresponding graph (G dep , G con ).The computation of the GCN representation is formulated as: where n refers to the total number of word nodes in the graph, W 2 z ∈ R d h ×d l is a trainable weight matrix for syntactic type embeddings, and b z ∈ R d h is the bias vector.The neighbour connecting strength distribution α z ij is calculated as below: where m z i = h bert i ⊕ l z i , and ⊕ is concatenation operator.In this way, node type and edge information are modelled in a unified way.
Finally, we aggregate the sequence representations from BERT encoder in Eq.( 1) and the graph representations from Const-Encoder and Dep-Encoder in Eq.( 4) as follows: where h f inal i is used by the tagging layer for tuple prediction.

Multi-view Learning
Recall that the const-graph and the dep-graph share the same set of nodes U and have two different sets of node representations h con and h dep , and two different edge sets E con and E dep , respectively.We treat const-graph and dep-graph as two syntactic views z ∈ {dep, con} of the input sentence.We adopt multi-view learning (Ata et al., 2021) in order to explore three types of relationships among the representations of const-graph and dep-graph views.The multi-view learning loss is used to finetune these representations, which can provide rich syntactic information for tuple generation.
We consider three categories of relationships between these two views.In the first category, the multi-view learning captures the inter-node and intra-view relationship in each view.In the second category, it aligns instances of the same node across various views, i.e., intra-node and inter-view relationship.In the third category, it ensures the nodes that are connected in one view should be similar with each other in another view, i.e., inter-node inter-view relationship.
Inter-node Intra-view Relationship.We design a loss to ensure the representations of connected nodes i and j in the same view z, i.e., h z i and h z j , to be similar.5This is to ensure coherence within the same view.
Intra-node Inter-view Relationship.While const-graph and dep-graph exhibit diversity, they ultimately converge on a common set of words.The same word, although bearing different syntactic functions, well connects the two views.Therefore, we design a loss for the intra-node and interview relations.Specifically, we make sure a node i's const-graph representation h con i to be similar to its dep-graph representation h dep i by minimizing the following loss: where z ′ indicates the other view than z, and P (h z ′ i , h z i ) is computed in a similar way as in Eq. 8. Inter-node Inter-view Relationship.We observe that const-graph and dep-graph share many common edges.In another word, two nodes linked in const-graph are often linked with each other in dep-graph as well.Consequently, we explore interview and inter-node relations in order to leverage the frequent edge sharing between the two graphs.Specifically, if node i and node j are connected in const-graph, we design a loss to move node i's const-graph representation h con i towards node j's dep-graph representation h dep j , as follows:  Loss Function.We combine the losses of the three categories of relations in Equations ( 7), ( 9), and ( 10) with the OpenIE sequence tagging loss L CE .L CE is the cross-entropy loss between the gold and the predicted word labels in sequence tagging (i.e., the BIO tags shown in Figure 5).The overall loss for our multi-view learning OpenIE is: where α, β, and γ are hyper-parameters, indicating the importance of each individual loss.

Experiments
We mainly conduct our experiments on LSOIE (Solawetz and Larson, 2021), a large-scale OpenIE data converted from QA-SRL 2.0 in two domains, i.e., Wikipedia and Science.It is 20 times larger than the next largest human-annotated OpenIE data, and thus is reliable for fair evaluation.6 LSOIE provides n-ary OpenIE annotations and gold tuples are in the ⟨ARG 0 , Relation, ARG 1 , . . ., ARG n ⟩ format.The dataset has two subsets, and we use both, namely LSOIE-wiki and LSOIE-sci, for comprehensive evaluation.LSOIE-wiki has 24,251 sentences and LSOIE-sci has 47,919 sentences.
CaRB (Bhardwaj et al., 2019) dataset is the largest crowdsourced OpenIE dataset.7However, CaRB only provides 1,282 annotated sentences, which are insufficient for training neural OpenIE models.As a result, we use the CaRB dataset purely for testing.We follow Kolluru et al. (2020b) to convert bootstrapped OpenIE4 tuples as labels for distant supervised model training.CaRB provides binary OpenIE annotations and gold tuples are in the form of ⟨Subject, Relation, Object⟩.Finally, we summarize the statistics of the training and testing datasets of LSOIE-wiki, LSOIE-sci, and CaRB in Table 2.

Baselines for Comparison
Baselines without Syntax.RnnOIE (Stanovsky et al., 2018) is the first sequence tagging model based on Bi-LSTM networks.In our work, we implement8 its model with GloVe word representation (Pennington et al., 2014), and name it as 'GloVe+bi-LSTM'.We further add a CRF layer to be another baseline model 'GloVe+bi-LSTM+CRF'.Meanwhile, we utilize BERT word representation along with its transformer layers, and name this baseline model as 'BERT'.'CopyAttention' (Cui et al., 2018) is the first neural OpenIE model which casts tuple generation as a sequence generation task.'IMOJIE' (Kolluru et al., 2020a) extends CopyAttention and is able to produce a variable number of extractions per sentence.It iteratively generates the next tuple, conditioned on all previously generated tuples.'CIGL-OIE + IGL-CA' (Kolluru et al., 2020b) models OpenIE as a 2-D grid sequence tagging task and iteratively tags the input sentence until the number of extractions reaches a pre-defined maximum.
Baselines with Syntax.We build baselines that utilize syntactic information, based on BERT.We first study the performance of using either dependency or constituency tree as additional syntactic feature, i.e., 'BERT+Dep-GCN' and 'BERT+Const-GCN'.Then, we present three models of fusing heterogeneous syntactic information from dependency and constituency trees.'Dep-GCN ⊕ Const-GCN' refers to the proposed parallel aggregation of the two graph representations using two syntactic GCNs.For comparison, we build a model of sequential aggregation 'Dep-GCN → Const-GCN', which passes the dependency graph representation from Dep-GCN as input to Const-GCN, and another model 'Const-GCN → Dep-GCN', which passes the constituency graph representation from Const-GCN as input to Dep-GCN.9

Evaluation
Evaluation Metric.For LSOIE-wiki and LSOIEsci dataset, Solawetz and Larson (2021) consider two tuples to match if their relations (or verbs) are identical, regardless of the matching of tuple arguments.We consider this scoring function to be over-lenient.Therefore, we revise their scoring function to consider both relation and arguments  matching, i.e., exact tuple matching, for accurate and fair comparison.For CaRB dataset, we use the default CaRB scoring function (Bhardwaj et al., 2019) to evaluate binary tuple with lexical level matching, i.e., partial tuple matching.Both scoring functions report F1 score based on precision and recall computed by tuples matching.Each tuple extracted is associated with a confidence value, so we can generate a precision-recall (P-R) curve and report the area under P-R curve (AUC).

Dataset
Evaluation Results.We compare SMiLe-OIE with other neural OpenIE baseline systems, summarizing their evaluation results in Table 3 and depicting their P-R curves in Figure 6.As shown in Figure 6, SMiLe-OIE achieves better precision at different recalls comparing to other baseline systems.Observe that both "BERT + Dep-GCN" and "BERT + Const-GCN" outperform "BERT".It shows that leveraging syntactic information, either dependency or constituency tree, benefits OpenIE task significantly.
The comparison results in Table 3 also show that the integration of the heterogeneous syntactic information is better than leveraging a single syntactic structure.Both parallel and sequential aggregation of the two graph representations achieve better results than "BERT" with either "Dep-GCN" or "Const-GCN".
Lastly, multi-view learning can effectively guide the fusion of the heterogeneous syntactic information, leading to the significant improvement and outperforming all the baseline systems except "CIGL-OIE + IGL-CA" on CaRB dataset.We find that "CIGL-OIE + IGL-CA" uses a complicated method of coordination boundary analysis dedicated for CaRB dataset.However, the coordination

Ablation Study
We ablate each part of our model and evaluate the ablated models against the LSOIE-wiki and LSOIE-sci datasets, and the results are reported in Table 4.The upper part of the table reports the ablation study results of removing each of the three multi-view learning losses L R 1 , L R 2 , and L R 3 .It shows that L R 2 (intra-node inter-view relationship) has slightly more contribution than the other two losses.The lower part reports the results of removing the GCN layers for dependency and constituency graphs.In this setting, we only concatenate the syntactic label representation to each word, without leveraging the syntactic graph structure.We observe that the GCN layers have larger impact on SMiLe-OIE than multi-view loss, although both contribute to the best performance.
Meanwhile, we study a few variants of const-graph, and the results are reported in Appendix 5.5.

Effectiveness of Const-graph
To verify the effectiveness of the proposed method for converting phrase-level relations of constituency tree into word-level relations of the constgraph (see Section 3.2), we build three variants: • Variant 1: we replace the constituency path, as the label of word node, with the last constituency tag in the path; • Variant 3: in step 4 of constituency relations flattening, we keep edges whose distance between two words is longer than 8.We evaluate the proposed conversion method and its three variants based on our baseline model BERT+Const-GCN.Note that neither Dep-Encoder nor multi-view learning is applied in BERT+Const-GCN.As shown in Table 5, the proposed method outperforms all the three variants.The const-graph with constituency path outperform its variant 1 that uses a single constituency tag.It proves that using constituency path is better than simply using the last constituency tag.Comparing to variant 2, the const-graph which connects the word to first word of its sibling, achieves better scores.Moreover, we find the keeping distant edges in variant 3 can deteriorate the model performance.As such, it is effective to remove distant edges from const-graph.

Conclusion
We design a novel strategy to map constituency tree into constituency graph only with word nodes, paving way for integrating constituency syntax with BERT and dependency syntax.With the aid of Const-GCN and Dep-GCN, we propose a new Ope-nIE system SMiLe-OIE which combines heterogeneous syntactic information through multi-view learning.Experiment results show that leveraging syntactic information can benefit OpenIE task significantly, and multi-view learning can effectively guide the heterogeneous syntactic information fusion.In future work, we will explore other types of structured information to further improve OpenIE.

Limitations
We analyze the limitation of our SMiLe-OIE from three perspectives: syntactic parse errors, POS tagging errors and multiple extractions issue.(1) As we integrate both constituency and dependency parsing results with OpenIE task, our system will inevitably suffer from the noises introduced by the off-the-shelf tools: spaCy and CoreNLP.(2) Meanwhile, the number of tuple extractions is highly correlated with the number of verbs extracted by the POS tagger.Therefore, the POS tagger's errors may also affect the quality of OpenIE.Based on our statistics of LSOIE-wiki and LSOIE-sci, the POS tagger fails to extract 8% of verbs that are supposed to be relation indicators.(3) Moreover, there are 6% of relation indicators corresponding to multiple tuple extractions, while our system extracts up to one tuple per relation indicator.Our system, suffering from the POS errors and the multiple extractions issue, fails to predict 14% of the gold tuples.
Verb-tuple Alignment.We assume that every tuple has a verb in its relation (see Section 4).However, this assumption does not mean that each verb in a sentence can lead to one tuple.If a sentence contains multiple verbs identified by the POS tagger, we create multiple training instances.In each training instance, one verb is considered as the relation indicator, i.e., its W verb initialized to 1, while W verb for all other verbs in the sentence are initialized to 0. The corresponding tuple taking this verb as relation is the gold label for this training instance.If this verb does not lead to a tuple, we set the label for all words in the sentence to be "O", i.e., no tuple extracted for this verb.As such, the model is able

A.2 Re-implementation Details
Note that all baselines are implemented to extract nary tuples on LSOIE-wiki and LSOIE-sci datasets, and binary tuples on CaRB dataset.'CopyAttention', 'IMoJIE', and 'CIGL-OIE + IGL-CA' are binary OpenIE systems and cannot be tested naturally on LSOIE-wiki and LSOIE-sci datasets.We reimplement their models to cater n-ary tuple extraction based on the code repositories.12In the evaluation, we evaluate 'CopyAttention', 'IMoJIE', and 'CIGL-OIE + IGL-CA' on LSOIE-wiki and LSOIEsci datasets through our n-ary re-implementations.
and dependency tree (to be used in Introduction) Constructed constituency graph from arcs room S-VP-S-VP-PP-NP the S-VP-S-VP-PP-NP in Constituency and dependency tree (to be used in Introduction) Constructed constituency graph from arcs Constructed dependency graph from arcs room Dependency tree (results from spaCy).

Figure 1 :
Figure1: An example of constituency tree, dependency tree, and n-ary OpenIE tuple to be extracted for sentence "Mary's cat likes playing plush toys in the room."

Figure 3 :
Figure 3: Word-level constituency relations converted from phrase-level relations of constituency tree.

Figure 6 :
Figure 6: Precision-recall curves of SMiLe-OIE and other baselines on LSOIE-wiki test set.
to learn which verb leads to a tuple extraction.During testing, multiple test instances are created if a sentence contains multiple verbs; one verb serves as a relation indicator in each test instance.No tuple is extracted if all predictions of this test instance are "O".Parameters.The hidden dimension d h for BERT representation h bert i , Dep-GCN graph representation h dep i , and Const-GCN graph representation h con i is 768.We use single-layer GCNs for both constituency and dependency graphs.The hidden dimension d l for Dep-Encoder type embedding l dep i and Const-Encoder path embedding l con i is 400.Hyper-parameters α, β, γ are set to 0.024, 0.012, and 0.012, respectively.Hyper-parameters selection is based on grid searching.The experiments are conducted with Tesla V100 32GB GPU and Intel ® Xeon ® Gold 6148 2.40 GHz CPU.

Table 2 :
Statistics of OpenIE datasets used in training and evaluating SMiLe-OIE.

Table 3 :
Kolluru et al. (2020b)sets.Scores with † are fromKolluru et al. (2020b).The best scores are in boldface, and the second best scores underlined.

Table 4 :
Ablation study of SMiLe-OIE.The best scores are in boldface boundary analysis of "CIGL-OIE + IGL-CA" cannot be generalized to other datasets, e.g., LSOIEwiki and LSOIE-sci, and it is thus not preferable.

Table 5 :
Effectiveness study of const-graph: the best scores are in boldface, and the second best underlined.