Open Information Extraction via Chunks

Open Information Extraction (OIE) aims to extract relational tuples from open-domain sentences. Existing OIE systems split a sentence into tokens and recognize token spans as tuple relations and arguments. We instead propose Sentence as Chunk sequence (SaC) and recognize chunk spans as tuple relations and arguments. We argue that SaC has better quantitative and qualitative properties for OIE than sentence as token sequence, and evaluate four choices of chunks (i.e., CoNLL chunks, simple phrases, NP chunks, and spans from SpanOIE) against gold OIE tuples. Accordingly, we propose a simple BERT-based model for sentence chunking, and propose Chunk-OIE for tuple extraction on top of SaC. Chunk-OIE achieves state-of-the-art results on multiple OIE datasets, showing that SaC benefits OIE task.


Introduction
Open Information Extraction (OIE) is to extract structured tuples from unstructured open-domain text (Yates et al., 2007).The extracted tuples are in the form of (Subject, Relation, Object) in the case of binary relations, and (ARG 0 , Relation, ARG 1 , . . ., ARG n ) for n-ary relations.The structured relational tuples are beneficial to many downstream tasks, such as question answering (Khot et al., 2017) and knowledge base population (Martínez-Rodríguez et al., 2018;Gashteovski et al., 2020).
When observing benchmark OIE datasets, most relations and their arguments are token spans.Recently, Sun et al. (2020) and Wang et al. (2022) propose to use Open Information Annotation (OIA) as an intermediate layer between the input sentence and OIE tuples.OIA represents a sentence as a graph where nodes are simple phrases, and edges connect predicate nodes and their argument nodes.By employing dataset-specific rules, these OIA graphs can be transformed into OIE tuples.Nevertheless, accurately generating the complete OIA graph for a given sentence poses a challenge.
Inspired by OIA, we propose a novel notion of Sentence as Chunk sequence (SaC), as an alternative intermediate layer representation.Chunking, a form of shallow parsing, divides a sentence into syntactically related non-overlapping phrases, known as chunks (Tjong Kim Sang and Buchholz, 2000).For instance, the simple phrases in OIA can be considered as chunks (Figure 1a).To justify the adaptability of SaC for OIE, we also employ other chunking options including Noun Phrase chunks (Figure 1b) and CoNLL chunks (Figure 1c). Figure 1 shows an example sentence with different chunking schemes.Subsequently, we propose Chunk-OIE, an end-to-end tagging-based neural OIE model.Chunk-OIE performs multi-task learning among two subtasks: (i) to represent sentence in SaC, and (ii) to extract tuples based on SaC.Our findings reveal that SaC-based OIE outperforms the traditional OIE approach representing sentences as token sequences, particularly when the OIE tuple relations and arguments align well with the chunks, as it is often the case.
Our contributions are as follows.Firstly, we propose a novel notion of Sentence as Chunk sequence (SaC) for OIE.On top of SaC, we further propose to simplify token-level dependency structure of sentence into chunk-level dependency structure in order to also encode chunk-level syntactic information for OIE.Secondly, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC.Finally, experimental results show the effectiveness of Chunk-OIE against strong baselines.Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OIE.
Recently, two kinds of neural systems have been explored, generative and tagging-based systems (Zhou et al., 2022).Generative OIE systems (Cui et al., 2018;Kolluru et al., 2020a;Dong et al., 2021) model tuple extraction as a sequenceto-sequence generation task with copying mechanism.Tagging-based OIE systems (Stanovsky et al., 2018;Kolluru et al., 2020b;Kotnis et al., 2022) tag each token as a sequence labeling task.SpanOIE (Zhan and Zhao, 2020) uses a different approach.It enumerates all possible spans (up to a predefined length) from a sentence.After rulebased filtering, the remaining candidate spans are classified to relation, argument, or not part of a tuple.However, enumerating and filtering all possible spans for scoring is computationally expensive.
Early neural models typically seldom utilize syntactic structure of sentence, which was required by traditional models.Recently works show that encoding explicit syntactic information benefits neu-ral OIE as well.RnnOIE (Stanovsky et al., 2018) and SenseOIE (Roy et al., 2019) encode POS / dependency as additional embedding features.MGD-GNN (Lyu et al., 2021) connects words, if they are in dependency relations, in an undirected graph and applies GAT as its graph encoder.RobustOIE (Qi et al., 2022) uses paraphrases (with various constituency form) for more syntactically robust OIE training.SMiLe-OIE (Dong et al., 2022) incorporates heterogeneous syntactic information (constituency and dependency graphs) through GCN encoders and multi-view learning.Inspired by them, we design a simple strategy to model dependency relation at the chunk level.Note that chunks in SaC partially reflect constituency structure as words in a chunk are syntactically related, by definition.
Sentence Chunking.Our proposed notion of SaC is based on the concept of chunking.Chunking is to group tokens in a sentence into syntactically related non-overlapping groups of words, i.e., chunks.Sentence chunking is a well studied pre-processing step for sentence parsing.We can naturally use the off-the-shelf annotations as external knowledge to enhance OIE.The earliest task of chunking was to recognize non-overlapping noun phrases (Ramshaw and Marcus, 1995) as exemplified in Figure 1b.Then CoNLL-2000 shared task (Tjong Kim Sang and Buchholz, 2000) proposed to identify other types of chunks such as verb and prepositional phrases, see Figure 1c.
OIX and OIA.Sun et al. (2020) propose Open Information eXpression (OIX) to build OIE systems.OIX is to represent a sentence in an intermediate layer, so that reusable OIE strategies can be developed on OIX.As an implementation, they propose Open Information Annotation (OIA), which is a single-rooted directed-acyclic graph (DAG) of a sentence.Its basic information unit, i.e., graph node, is a simple phrase.A simple phrase is either a fixed expression or a phrase.Sun et al. (2020) define simple phrases to be: constant (e.g., nominal phrase), predicate (e.g., verbal phrase), and functional (e.g., wh-phrase).Edges in an OIA graph connect the predicate/function nodes to their arguments.Wang et al. (2022) extend OIA by defining more simple phrase types and release an updated version of the OIA dataset.The authors also propose OIA@OIE, including OIA generator to produce OIA graphs of sentences, and rule-based OIE adaptors to extract tuples from OIA graphs.

Task Formulation
We formulate the OIE tuple extraction process as a two-level sequence tagging task.The first level sequence tagging is to perform sentence chunking by identifying boundary and type of each chunk, and representing Sentence as Chunks (SaC).The second level sequence tagging is to extract OIE tuples on top of SaC.
Formally, given a sentence with input tokens s t = [t 1 , . . ., t n ], we first obtain the chunk sequence This process can be formulated as two sequence tagging sub-tasks: (i) binary classification for chunk boundary, and (ii) multi-class classification for chunk type (See example chunk boundaries and types in the outputs of "Boundary & Type Tagging" module in Figure 2).Note that tokens at boundaries are tagged as 1 and non-boundaries as 0. Subsequently, we perform the tagging on the chunk sequence [c 1 , . . ., c m ] to extract OIE tuples (Section 3.3).A variable number of tuples are extracted from a sentence.Each tuple can be represented as [x 1 , . . ., x L ], where each x i is a contiguous span of chunks, either an exact match or chunk concatenation.One of x i is a tuple relation (REL) and the others are tuple arguments (ARG l ).For instance, the tuple in Figure 2 can be represented as (arg 0 ='Ms.Lee', rel='told', arg 1 ='Lily and Jimmy').We address the two-level sequence tagging via multi-task learning (Section 3.4).

Representing Sentence as Chunks (SaC)
We first use BERT to get the contextual representations of input tokens [t 1 , . . ., t n ] and then concatenate them with the POS representations to obtain the hidden representations of tokens as follows: where W BERT is trainable and initialized by BERT word embeddings, and W POS is a trainable embedding matrix for POS types.The function pos_type(•) returns the POS type of input token.
h i is then passed into tagging layers for chunk boundary and type classification concurrently.
where p b i and p t i are the softmax probabilities for chunk boundary and type of token t i , respectively.
Then, we chunk the sentence according to the boundary predictions, i.e., the sentence is chunked to m pieces if there are m boundary tokens.The token is marked to be boundary if argmax(p b i ) = 1.The type of each chunk is determined by the type of boundary token, which is argmax(p t i ).In overall, we convert the token sequence [t 1 , . . ., t n ] into chunk sequence [c 1 , . . ., c m ] by SaC.

SaC-based OIE Extractor
We design SaC-based OIE extractor on top of SaC.Given the typed chunks inferred by SaC (Section 3.2), we convert the BERT token representations into chunk representations, and encode the chunk types.Subsequently, we model the chunk sequence into chunk-level dependency graph.Finally, we use Graph Convolution Network (GCN) to get the chunk-level dependency graph representations.The last tagging layer performs tagging at the chunk-level to extract OIE tuples, based on the concatenation of BERT-based and GCN-based chunk representations.
BERT-based Chunk Encoder.The chunk representations are based on the token representations h i in Equation 1.
Also, as each verb in a sentence is a potential relation indicator, verb embedding is useful to highlight this candidate relation indicator (Dong et al., 2022).We follow Dong et al. (2022) to encode tokens with additional verb embeddings, i.e., h token i = h i + W verb (rel_candidate(t i )), where rel_candidate(t i ) returns 1 if t i is the candidate relation indicator of the instance (otherwise, 0), and W verb is a trainable verb embedding matrix.
For a single-token chunk (c i = [t j ]), its chunk representation h c ′ i is the same as the token representation h token j .For a chunk with multiple tokens ).Moreover, we encode chunk types with a trainable chunk type embedding W chunk for additional type information: where the function chunk_type(•) returns the type (e.g., Noun Phrase, Verbal Phrase) of input chunk.
Chunk-level Dependency Graph.Recent studies show that syntactic structures benefit neural models for NLP tasks including OIE (Fei et al.,  2021; Dong et al., 2022).Thus, given the sentence represented in SaC, we model the dependency structure of input sentence at chunk level.
For this purpose, we convert a token-level dependency structure to that of a chunk level by ignoring intra-chunk dependencies and retaining inter-chunk dependencies.Figure 3 shows the chunk-level dependency tree of the example sentence in Figure 1 (with OIA simple phrases as chunks) and its dependency tree at word level.
The chunk-level dependency graph is formulated as G = (C, E), where the nodes in C correspond to chunks [c 1 , . . ., c m ] and e ij in E equals to 1 if there is a dependency relation between a token in node c i and a token in node c j ; otherwise, 0. Each node c i ∈ C has a node type.We label a node with the type of the dependency from the node to its parent node.Notice that SaC greatly simplifies the modelling of sentence syntactic structure.
Dependency Graph Encoder.Given the chunklevel dependency graph G = (C, E), we use GCN to encode the chunk-level dependency structure.We compute the node type embedding where d l is the embedding dimension and N dep is the number of unique dependency relations.The function "dep_type(•)" returns input chunk type.Subsequently, we use GCN to encode G with representations as follows: where m refers to the total number of chunk nodes in G, W l ∈ R d h ×d l is a trainable weight matrix for dependency type embeddings, and b ∈ R d h is the bias vector.The neighbour connecting strength distribution α ij is calculated as below: where m i = h c i ⊕ l i , and ⊕ is concatenation operator.In this way, node type and edge information are modelled in a unified way.
For OIE extraction, we aggregate chunk representations from the BERT-based representations in Equation 4 and from the GCN-based representations in Equation 5. We then pass them into tagging layers for OIE span classification.

Multi-task Learning Objective
As mentioned, we perform the two-level sequence tagging of sentence chunking and OIE extraction.
We combine losses from SaC and OIE tagging to jointly optimize the Chunk-OIE model.
For SaC, considering that boundary and type classification are complementary to each other, we combine the following cross-entropy losses: where y b and y t are gold labels for chunk boundary and type, respectively.p b and p t are the softmax probabilities for chunk boundary and type tagging obtained from Equatios 2 and 3, respectively.c 1 refers to the number of unique chunk types.α is a hyperparameter balancing the two losses.
For OIE, the gold labels are provided at token level, whereas our predicted labels are at chunk level.To enable evaluation of the generated chunklevel tuples against the token-level gold labels, we assign the predicted probability of a multi-token chunk to all its member tokens.The corresponding cross-entropy loss is computed between the predicted and the gold OIE tags: where y oie is the gold label, and p oie is the softmax probability obtained from Equation 7. c 2 is the number of unique OIE span classes.Finally, we combine losses from Equations 11 and 12, and minimize the following multi-task learning loss: where β is a hyperparameter balancing the chunking and OIE losses.More training details are in Appendix A.2.
LSOIE is a large-scale OIE dataset converted from QA-SRL 2.0 in two domains, i.e., Wikipedia and Science.It is 20 times larger than the next largest human-annotated OIE data, and thus is reliable for fair evaluation.LSOIE provides n-ary OIE tuples in the (ARG 0 , Relation, ARG 1 , . . ., ARG n ) format.We use both datasets, namely LSOIE-wiki and LSOIE-sci, for comprehensive evaluation.
CaRB dataset is the largest crowdsourced OIE dataset.CaRB provides 1,282 sentences with binary tuples.The gold tuples are in the (Subject, Relation, Object) format.
BenchIE dataset supports a comprehensive evaluation of OIE systems for English, Chinese, and German.BenchIE provides binary OIE annotations and gold tuples are grouped according to fact synsets.In our experiment, we use the English corpus with 300 sentences and 1,350 fact synsets.
Note that the multi-task training requires both chunking and OIE labels of ground-truth.However, chunking labels are not present in OIE datasets.We construct chunk labels for the OIE datasets used in our experiment (in Section 4.2).
Evaluation Metric.For LSOIE-wiki and LSOIEsci datasets, we follow Dong et al. (2022) to use exact tuple matching.A predicted tuple is counted as correct if its relation and all its arguments are identical to those of a gold tuple; otherwise, incorrect.For the CaRB dataset, we use the scoring function provided by authors (Bhardwaj et al., 2019), which evaluates binary tuples with token level matching, i.e., partial tuple matching.The score of a predicted tuple ranges from 0 to 1.For the BenchIE dataset, we also adopt the scoring function proposed by authors (Gashteovski et al., 2022), which evaluates binary tuples with fact-based matching.A predicted tuple is counted as correct if it exactly matches to one fact tuple, and otherwise incorrect.

Chunk Choices and Labels Construction
SaC is to represent a sentence in syntactically related and non-overlapping chunks.However, there is no standard definition on what word groups should be chunked, though SaC can be achieved by any chunking scheme.We use four types of chunks to realize SaC.Also, we construct the chunking labels among OIE datasets through (i) our pre-trained chunking model or (ii) existing parser.
NP chunks.In this scheme, the tokens of a sentence are tagged with binary phrasal types: NP and O, where O refers to the tokens that are not part of any noun phrases.We notice that there often exists nested NP.Accordingly, we create two types of NP chunks, i.e., NP short and NP long .For example, the phrase "Texas music player" is a nested NP.NP long will treat it as a single NP, whereas NP short will split it to "Texas" and "music player" as two NPs.We use Stanford constituency parser to get NP chunks.
SpanOIE spans.SpanOIE (Zhan and Zhao, 2020) enumerates all possible spans of a sentence, up to 10 words.To reduce the number of candidate spans, it keeps only the spans with certain syntactic dependency patterns.
The total numbers, and average lengths of the chunks of the four types and of the gold spans of the four datasets are listed in Table 2.

OIE systems for Comparison
Token-level OIE systems.CopyAttention (Cui et al., 2018) is the first neural OIE model which casts tuple generation as a sequence generation task.IMOJIE (Kolluru et al., 2020a) 1 show that Chunk-OIE, in particular its Sac-OIA-SP and SaC-CoNLL variants, achieve state-of-the-art results on three OIE datasets: LSOIE-wiki, LSOIE-sci, and BenchIE.Meanwhile, their results on CaRB are comparable with baselines.We evaluate the statistical significance of Chunk-OIE against its token-level baseline based on their F 1 's (each experiment is repeated three times with different random seeds).The p-values for Chunk-OIE (OIA-SP) and Chunk-OIE (CoNLL) are 0.0021 and 0.0027, indicating both results are significant at p < 0.01.

Experimental results in Table
Comparing to token-level system: Chunk-OIE surpasses its token-level counterpart BERT+Dep-GCN on all the four datasets.Note that both Chunk-OIE and BERT+Dep-GCN rely on BERT and Dependency GCN encoder; the only difference is the input unit, i.e., chunks for Chunk-OIE and tokens for BERT+Dep-GCN.Consequently, we suggest using chunks is more suitable to OIE.We observe SMiLe-OIE is a strong baseline.It explicitly models additional constituency information and the multi-view learning is computational complex.Comparing to it, Chunk-OIE is simple yet effective.CIGL-OIE performs good on CaRB dataset.It adopts coordination boundary analysis to split tuples with coordination structure, which well aligns with the annotation guidelines of CaRB dataset, but not with the guidelines of the LSOIE   and BenchIE datasets.In Chunk-OIE, SaC treats chunks with coordination (e.g., "Lily and Jimmy") as a single unit, resulting in poor scores in such cases.Except on CaRB, CIGL-OIE cannot generalize well to other datasets.
Comparing to Chunk-level system: Chunk-OIE with OIA-SP and CoNLL chunking schemes outperform the other two variants with NP short and NP long for all the four datasets.This indicates that multi-label chunking is more effective for the chunk-level OIE than simply recognizing noun phrases in a sentence.And, Chunk-OIE with NP short outperforms Chunk-OIE with NP long for all the four datasets, which may reflect the fact that OIE tuple arguments are often simple noun phrases rather than cascaded noun phrases.However, Chunk-OIE with OIA-SP and CoNLL chunking schemes show comparable performance.
Chunk-OIE achieves better results than SpanOIE, indicating that SaC is more reasonable than the spans enumerated by SpanOIE.Note that OIE@OIA generates tuples with rules manually crafted for OIE2006 and CaRB datasets.Also, the authors have not released source code of their rules.Therefore, OIE@OIA cannot be evaluated on LSOIE-wiki, LSOIE-sci, and BenchIE.
Chunk-OIE: 2-stage versus end-to-end: We notice that Chunk-OIE trained end-to-end achieves slightly better performance than Chunk-OIE with 2-stage training.This indicates that learning of sentence chunking can benefit OIE learning.

Ablation Study
We ablate each part of Chunk-OIE (OIA-SP, CoNLL), and evaluate the ablated models on LSOIE-wiki and LSOIE-sci.The results are reported in Table 3.We first remove the dependency graph encoder.In this setting, chunking representation obtained in Equation 4 is directly used for tuple extraction.Results show that removing chunk level dependencies decreases the performance of Chunk-OIE, indicating the importance of chunk-level dependency relations.To explore the importance of chunk type, we ablate the chunk type embedding as described in Equation 4. Observe that this also leads to performance degradation.

Boundary Analysis on SaC
It is critical to understand the suitability of adopting chunks as the granularity for OIE.In this section, we perform boundary alignment analysis of SaC against gold spans in a benchmark OIE dataset named LSOIE.Gold Spans are the token spans of tuple arguments / relations in ground truth annotations.We analyze CoNLL chunks, OIA simple phrases, NP chunks, and SpanOIE spans as described in Section 4.2.
The boundary alignment analysis is conducted from two perspectives.(1) Precision: How often do the boundaries of SaC chunks match those of gold spans?(2) Recall: How often do the boundaries of gold spans match those of SaC chunks?There are four scenarios of boundary alignment, as exemplified in Table 4. Match-Exact: A gold span is exactly matched to a chunk span.Match-Concatenation: A gold span is mapped to multiple chunks in a consecutive sequence.2Mismatch-Overlap: A chunk overlaps with a gold span, and at least one token of the chunk is not in the gold span.Mismatch-NoOverlap: A chunk does not overlap with any gold span.
We show the precision and recall analysis of four boundary alignment scenarios in Table 5 and summarize the overall scores in Table 6.Observe that CoNLL chunks and OIA simple phrases show higher precision and recall of the Match boundary alignment than the other chunks.We note that the boundary alignment of CoNLL chunks to LSOIE is better than that of OIA simple phrases to LSOIE, but the two Chunk-OIE variants with CoNLL chunks and with OIA simple phrases show comparable performance.This may indicate that the precision and recall analysis of boundary alignment is 'generally good' but not 'precise' indicator for Chunk-OIE performance.We also note that SpanOIE has only 3.3% of precision, indicating that enumerating all possible spans should bear heavy burden to detect correct spans.

Conclusion
We propose Sentence as Chunk sequence (SaC) as an intermediate layer for OIE tuple extraction.We then propose Chunk-OIE, by leveraging SaC and chunk-level dependencies, achieves state-of-the-art results on several OIE datasets.We experiment on Chunk-OIE with various chunk choices as SaC, and perform detailed statistical study to understand to what extent these chunks align with OIE gold tuple spans, and how the boundary alignment impacts the overall OIE performance..We indicate that CoNLL and OIA-SP chunks have better boundary alignment with OIE gold tuple spans than noun phrases, and Chunk-OIE adopting them as SaC achieves best results.In our future work, we aim to build upon SaC to develop even more effective OIE models.

Limitations
The limitations of Chunk-OIE are analyzed from three perspectives: SaC chunking errors, syntactic parsing errors, and multiple extractions issue.
(1) Both CoNLL-chunked phrases and OIA simple phrases suffer around 10% boundary violations as shown in Table 5 (under Recall analysis).Since we use SaC as intermediate layer for OIE and perform tagging at chunk level, the chunk boundaries become a hard constraint of the extracted tuples.Among these violations, we examine 100 examples of OIA simple phrases and find that 55% of these violations are caused by chunking errors due to some complicated sentence structures.The rest is mainly caused by tuple annotation errors, meaning that all OIE systems will suffer from these annotation errors.(2) Chunk-OIE relies on the chunk-level dependency relations as additional syntactic knowledge.Therefore, Chunk-OIE will inevitably suffer from the noises introduced by the off-the-shelf dependency parsing tools.Also, we use POS tagger to extract all verbs in the sentence as tuple relation indicators.It is reported that the POS tagger fails to extract 8% of verbs that are suppose to be relation indicators (Dong et al., 2022).Therefore, the discrepancy between POS verbs and tuple relations may affect the OIE quality.(3) Moreover, there are 6% of relation indicators corresponding to multiple tuple extractions (one verb leading to more than one tuple), while our system extracts up to one tuple per relation indicator.
Chunking model on CoNLL2000 Chunk type F1 AT (Yasunaga et al., 2018) 95.3 Flair (Akbik et al., 2018) 96.7 MAT (Chen et al., 2020) 97.0 ACE (Wang et al., 2021) 97.3 Ours (BERT+Multi-task) 97.0 Tables 11a and 11b report the F 1 of chunk type classification by the major chunk types in both datasets.In this set of experiments, the chunk boundaries are detected together as type classification (i.e., the same setting as in Section 3.2).In both datasets, noun, verbal, and prepositional phrases dominate the chunks.The F 1 scores are reasonably high on these major types.Again, as the sentences in CoNLL-2000 datasets are much longer, the number of chunks in CoNLL-2000 is much larger than that in OIA dataset, although the two datasets have comparable number of test sentences.

A.5 Two stages Chunk-OIE model
Instead of training an end-to-end Chunk-OIE model, we also experiment on a pipeline method that consists of two-stage training, corresponding to two sub-models as shown in Figure 5.The first stage is to pre-train a SaC chunking model with chunking datasets as described in Appendix A.3.We then obtain the chunking labels for sentence in OIE datasets through the SaC sub-model.The second stage is to train the OIE extractor, during which the chunking labels are given as inputs to the   11).

A.6 Details of OIE Datasets
In this section, we elaborate more details about the train/test set of OIE datasets as mentioned in Section 4.1.For LSOIE, we follow Solawetz and Larson (2021) and Dong et al. (2022) to split the train/test set in LSOIE-wiki and LSOIE-sci domain, respectively.The statistics of LSOIE train/test sets are listed in Table 12.
CaRB only provides 1,282 annotated sentences and BenchIE provides 300 sentences, which are insufficient for training neural OpenIE models.As a result, we use the CaRB and BenchIE dataset purely for testing.We follow Kolluru et al. (2020b) to convert bootstrapped OpenIE4 tuples as labels for distant supervised model training.The statistics of CaRB and BenchIE train/test sets are listed in Table 12.

A.7 Chunk-level Dependency Modelling
We argue that SaC simplifies the modeling of sentence syntactical structure.We elaborate this point with the example sentence shown in Figure 3a.In this sentence, "Lee" is the appositional modifier ('appos') of "headmaster".However, it is actually  the phrase "Ms.Lee" that is appositional to the phrase "the headmaster".If we want to model the relation between "Ms." and "the" through token dependencies, we need to pass through three hops ('compound' → 'appos' → 'det') in order to link them up.In contrast, connecting "Ms." and "the" via chunk-level dependencies only requires a single hop ('appos').In another case, "Lee" is the nominal subject ('nsubj') and "Lily" is the direct object ('dobj') of verb "told".Apparently, we need additional dependency relations to locate the complete subject and object of "told".If we model dependencies at chunk level, the complete subject and object of "told" can be easily located to be "Ms.
Lee" and "Lily and Jimmy" respectively.

Figure 1 :
Figure 1: A sentence in different chunk sequences.

Figure 2 :
Figure 2: The overview of Chunk-OIE.Punctuation marks in the sentence are neglected for conciseness.Chunk-OIE is an end-to-end model with (i) representing Sentence as Chunks (SaC) and (ii) SaC-based OIE tuple extraction.

Figure 3 :
Figure 3: Dependency trees at token-level and chunklevel (in OIA simple phrases), respectively.Note that we use spaCy to extract the dependency relations for sentences.
extends CopyAtten-To explore the effect of different chunks in SaC, we implement four variants: Chunk-OIE (NP short ), Chunk-OIE (NP long ), Chunk-OIE (OIA-SP), and Chunk-OIE (CoNLL).Besides the end-to-end Chunk-OIE proposed in Section 3, we also experiment on variants that conduct two-stage training, i.e., the SaC part is pretrained with chunking dataset and frozen during the training of OIE tuple extraction (more details about 2-stage Chunk-OIE are in Appendix A.5).

Table 4 :
Four scenarios for matching a gold tuple span (in blue) to a generated chunk (in green).

Table 5 :
Precision and Recall Analysis.L s and L p are length of gold spans and generated chunks, respectively.For each type of match/mismatch case, the highest score is in boldface and second highest score is underlined.

Table 6 :
Precision, Recall, and F 1 of generated chunks; best scores are in boldface, second best underlined.

Table 8 :
Chunk type F 1 on CoNLL 2000 chunking dataset.Detailed results in Appendix A.4.

Table 9 :
Wang et al. (2022)nking on OIA dataset.Note thatWang et al. (2022)report chunk boundary result only and state that 96.4% of them are labelled with correct types.We hence estimate their chunk type F 1 (marked with † ) based on the given percentage./Development/ Test sets.Each OIA annotation is a sentence-graph pair.We only utilize the graph nodes (i.e., simple phrases) for the chunking task.We believe a simple BERT-based SaC is sufficient to support our study on SaC-based OIE extraction, as chunking is not the key focus of our study.This could be a reason contributing to the higher F 1 on the CoNLL-2000 dataset.Recall that chunk type classification is conditioned on the boundary provided, i.e., type is meaningful only if boundary is correctly detected.If the ground truth chunk boundaries are known, the overall type classification F 1 is 99.2% and 95.8% respectively, on CoNLL-2000 and OIA datasets.However, in reality, the chunk boundaries have to be detected as well.

Table 10 :
Chunk boundary extraction accuracy by chunk length.

Table 11 :
Accuracy of chunk type classification by chunk type.Note that, for CoNLL-2000 datasets, CONJP, INTJ, LST and UCP each has fewer than 10 chunks, hence are excluded from the results.OIE sub-model.The OIE sub-model is to train with the OIE datasets (in Section 4.1) and loss function (in Equation

Table 12 :
Statistics of OIE datasets used in training and evaluating Chunk-OIE.