A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction

In this paper, we propose a novel edge-editing approach to extract relation information from a document. We treat the relations in a document as a relation graph among entities in this approach. The relation graph is iteratively constructed by editing edges of an initial graph, which might be a graph extracted by another system or an empty graph. The way to edit edges is to classify them in a close-first manner using the document and temporally-constructed graph information; each edge is represented with a document context information by a pretrained transformer model and a graph context information by a graph convolutional neural network model. We evaluate our approach on the task to extract material synthesis procedures from materials science texts. The experimental results show the effectiveness of our approach in editing the graphs initialized by our in-house rule-based system and empty graphs.


Introduction
Relation extraction (RE), the task to predict relations between pairs of given entities from literature, is an important task in natural language processing. While most existing work focused on sentence-level RE (Zeng et al., 2014), recent studies extended the extraction to the document level since many relations are expressed across sentences (Christopoulou et al., 2019;Nan et al., 2020).
In document-level RE, models need to deal with relations among multiple entities over a document. Several document-level RE methods construct a document-level graph, which is built on nodes of words or other linguistic units, to capture documentlevel interactions between entities (Christopoulou 1 The source code is available at https://github. com/tti-coin/edge-editing.   Nan et al., 2020). However, such methods do not directly consider interactions among relations in a document, while such relations are often dependent on each other, and other relations can be considered as important contexts for a relation.
We propose a novel, iterative, edge-editing approach to document-level RE. The overview of our approach and an example of the extraction results are illustrated in Figure 1. Our approach treats relations as a relation graph that is composed of entities as nodes and their relations as edges. The relation graph is first initialized using the edges predicted by an existing RE model if provided. Edges are then edited by a neural edge classifier that represents edges using the document information, prebuilt graph information, and the current edge information. The document information is represented with pretrained Longformer models (Beltagy et al., 2020), while the graph information is represented with graph convolutional networks (Kipf and Welling, 2017). Edges are edited iteratively in a close-first manner so that the approach can utilize the information of edges between close entity pairs in editing edges of distant entity pairs, which are often difficult to predict. We evaluate our approach on the task to extract synthesis procedures from text (Mysore et al., 2019) and show the effectiveness of our approach.
The contribution of this paper is three-fold. First, we propose a novel edge-editing approach for document-level RE that utilizes contexts in both relation graphs and documents. Second, we build a strong rule-based model and show that our approach can effectively utilize and enhance the output of the rule-based model. Third, we build and evaluate a neural model for extracting synthesis procedures from text for the first time.

Approach
Our approach extracts a relation graph on given entities from a document. We formulate the extraction task as an edge-editing task, where the approach iteratively edits edges with a neural edge classifier in a close-first manner (Miwa and Sasaki, 2014).

Iterative Edge Editing
We build a relation graph by editing the edges iteratively using the edge classifier in Section 2.2. The building finishes when all edges are edited. The edges are edited in a close-first manner (Miwa and Sasaki, 2014;Ma et al., 2019) that edits the close edges first and far edges later. The distance between the entity pair is defined based on the appearing order of entities in a document; if two entities in a pair appear m-th and m + 3-th, the distance becomes 3. Note that each edge is edited only once throughout the entire editing process.
Algorithm 1 shows the method to build the graph by the iterative edge editing. To reduce the computational cost, the pairs with the same distance are edited simultaneously and the pairs with distances more than or equal to the maximum distance d max are edited simultaneously. This reduces the number of edits from |N | 2 to d max .

Edge Classifier
An edge classifier predicts the class of the target edgeÊ ij from inputs that are composed of a document information doc, a graph of nodes N and edges E, and the node pair (N i , N j ) of a target edge. The classifier composed of three modules: EncodeNode that produces document-based node representationsN using the document doc and the entity information of the nodes N .
EncodeEdge that obtains the representation of edgesĒ that applies GCN on a prebuilt graph with the node representationsN and edges E. ClassifyEdge that predicts the class of the edgeÊ ij using the edge representationĒ ij between the node pair (N i , N j ). We explain the details of these modules in the remaining part of this section.
EncodeNode employs Longformer (Beltagy et al., 2020) to obtain the document-level representation. It aggregates subword representations within each entity by max-pooling Pool and concatenates the aggregated information with the entity's class label representation v lab .
To prepare the input to EncodeEdge, the obtained document-based node representation is enriched by GCN to introduce the context of each node in the prebuilt graph:N G = GCN(N , E). We add inverse directions to the graph and assign different weights to different classes in graph convolutional network (GCN) following Schlichtkrull et al. (2018). The produced node representation N G includes both document and prebuilt graph contexts.
EncodeEdge produces the edge representation E fromN G . It individually calculates the repre-sentation of the edgeĒ ij for each pair of nodes (N i , N j ) by combining the representations of nodes similarly to Zhou et al. (2021) with the embedding of the distance of the entity pair b ij and the edge class e old ij before editing. The distance between the entity pairs is calculated in the same way as in Section 2.1. If the distance exceeds a predefined maximum distance, it will be treated as the maximum distance. We prepare fully connected (FC) layers, FC H and FC T , for the start point (head) and end point (tail) nodes and calculate the edge representation as follows: (2) where W denotes a trainable weight parameter.
ClassifyEdge classifies the target edge E ij into a relation class or no relation. It applies a dropout layer (Srivastava et al., 2014), a FC layer for output FC out and softmax to the edge representationĒ ij to predict the classÊ ij with the highest probability.
We maximize the log-likelihood in training the edge classifier.

Experimental Settings
We evaluate our approach on the materials science procedural text corpus (Mysore et al., 2019). In the corpus, the synthesis procedures are annotated as a graph in a document, where 19 node types such as materials, operations, and conditions and 15 directed relation types are defined. The corpus consists of 200 documents for training, 15 for development, and 15 for test. The statistics of the corpus are shown in Appendix A. We chose this corpus since this corpus is publicly available, manually annotated, and it deals with a dense document-level relation graph.
We prepared a rule-based model (RULE) as a baseline and as an existing model to initialize the edges, which was adapted from the rule-based system in Kuniyoshi et al. (2020). The rules are summarized in Appendix B.
We employ the micro F-score for each relation class as the evaluation metric. We tune the hyper-parameters such as the number and dimensions of layers and dropout rate on the development set using the hyper-parameter optimization  (Akiba et al., 2019) and the details are shown in Appendix C. We employ the Adam (Kingma and Ba, 2015) optimizer with the default parameters in PyTorch (Paszke et al., 2019) except for the learning rate. The training was performed without finetuning for the Longformer because the corpus is small to train a large transformer model.
We compare the following models on graphs initialized by the rule-based model (with RULE) and empty graphs (without RULE). EDIT: Proposed model EDIT-IE: EDIT without iterative edge editing, i.e., d max = 1. EDIT-GCN: EDIT without GCN by replacingN G withN in Equation (2) RANDOM EDIT: EDIT with random-order editing Additionally, we evaluate the following model with randomly initialized graphs. RANDOM INIT: EDIT with randomly connected edges, the number of which is same as that of the extraction results of RULE, with random classes Note that although we did not provide the direct comparison with the existing models, our EDIT-GCN without RULE is similar to BRAN (Verga et al., 2018); the only differences are that we use Longformer (Beltagy et al., 2020) instead of transformers, and NER training is not included. Moreover, most of the models for the document-level RE require dataset annotating both entities and their mentions, so the existing models like AT-LOP (Zhou et al., 2021) cannot be directly applied to the current task.

Results without RULE
We show the results with empty initial graphs in Table 1. EDIT shows the highest scores and this indicates the effectiveness of our approach when the initial graphs are empty. When we compare EDIT, EDIT-IE, and RANDOM EDIT, we find that both iterative edge editing and close-first strategy  Table 2: Evaluation results in micro F-score with RULE are effective. Since EDIT-GCN extracts from context without graph structure information, the better performance of EDIT over EDIT-GCN shows the effectiveness of the information in the graph structure. The low performance with RANDOM INIT shows that the edge information needs to be reliable.

Results with RULE
We summarize the results with RULE in Table 2. We show the detailed results for EDIT without RULE, RULE, and EDIT-IE with RULE in Appendix D.
When we compare the results with Table 1, the performance with RULE is better than the counterpart without RULE for all the settings. Furthermore, all the scores in Table 2 are better than those in Table 1, which shows the strength of RULE.
Surprisingly, the results with our approach are better than that of RULE even though RULE is better than our approach without RULE. This indicates our EDIT approach can make the prediction accurate. We can conclude that our EDIT approach can utilize the information from the rule-based model and the initialization of the edges by RULE is useful.
As for the performance of the models, most results are consistent with Table 1 except that EDIT-IE shows the highest score on the test set. This may be partly because the initial graph by RULE is already reliable and editing does not help to improve the context. Results with RANDOM EDIT support this since the performance degradation with RAN-DOM EDIT is large compared to Table 1 and RAN-DOM EDIT is harmful in this case. Moreover, the different behaviors on the development and test sets indicate an imbalance in the corpus split.

Case Study
We illustrated 6 graphs for an example document (Zhang et al., 2007) in the development data set shown in Figure 2: the result on the right side of Figure 1 shows our best extraction result using EDIT-IE with RULE; Figure 3 shows the correct extraction; Figure 4 shows the extraction result using EDIT without RULE; Figure 5 shows the extraction result using RULE; and Figure 6 shows the extraction result using EDIT with RULE. Figure 3 shows the material synthesis starts from mixed with materials SrCO3, MoO3 and Ni to prefired and so on, and the material SrMo1-xNixO4 is synthesized. When we compare Figure 6 with Figure 5, the extraction results are similar to RULE. Although the overall performance is low, Figure 4, which does not depend on the rule, extracts relations that are not extracted by the other systems and this shows the models with RULE and without RULE capture different relations.

Related Work
RE has been widely studied to identify the relation between two entities in a sentence. In addition to traditional feature/kernel-based methods (Zelenko et al., 2003;Miwa and Sasaki, 2014) However, sentence-level RE is not enough to cover the relations in a document, and document-level RE has increasingly received research attention in recent years.
Major approaches for document-level RE are graph-based methods and transformer-based methods. For graph-based methods, Quirk and Poon (2017) first proposed a document graph for document-level RE. Christopoulou et al. (2019) constructed a graph that included heterogeneous nodes such as entity mentions, entities, and sentences and represented edges between entities from the graph. Nan et al. (2020) proposed the automatic induction of a latent graph for relational reasoning across sentences. The document graphs in these methods are defined on nodes of linguistic units such as words and sentences, which are different from our relation graphs. Unlike our method, these methods do not directly deal with relation graphs among entities.
For transformer-based methods, Verga et al. (2018) introduced a method to encode a document with transformers to obtain entity embedding and A series of polycrystalline samples of SrMo1-xNixO4 (0.02<=x<=0.08) were prepared through the conventional solid-state reaction method in air. Appropriate proportions of high-purity SrCO3, MoO3, and Ni powders were thoroughly mixed according to the desired stoichiometry, and then prefired at 900 [?]C for 24 h. The obtained powders were ground, pelletized, and calcined at 1000, 1100 and 1200 [?]C for 24 h with intermediate grinding twice. White compounds, SrMo1-xNixO4, were obtained. The compounds were ground and pressed into small pellets about 10 mm diameter and 2 mm thickness. These pellets were reduced in a H2/Ar (5%: 95%) flow at 920 [?]C for 12 h, and then the deep red colored products of SrMo1-xNixO3 were obtained.

Conclusions
We proposed a novel edge editing approach for document-level relation extraction. This approach treats the task as the edge editing of relation graphs, given nodes. It edits edges considering contexts in the document and the relation graph. We evaluated the approach on the material synthesis procedure corpus, and the results showed the usefulness of initializing edges by the rule-based model, utilizing prebuilt graph information for editing and editing in a close-first manner. As a result, our model performed an F-score of 86.3% for edge prediction.
In future work, we plan to improve the approach to obtain more consistent and accurate relation graphs. We also would like to apply the approach to other data sets such as cooking recipes (Mori et al., 2014) and temporal graphs (Pustejovsky et al., 2003;Cassidy et al., 2014).

A Statistics of the Materials Science Procedural Text Corpus
We present the statistics of the materials science procedural text corpus 2 proposed by Mysore et al.
. Table 3 and Table 4 summarize the numbers of entities and relations, respectively.

B Rule-based Relation Extraction Model
We built a rule-based model by defining the rules to extract relations between entity pairs for the materials science procedural text corpus (Mysore et al., 2019). The rules were adapted from the rule-based model in (Kuniyoshi et al., 2020) for the target corpus. The rules depend on labels of the entities of an entity pair, distance, and the order of occurrence of the entities. According to the combination of labels of the entities, our rules are divided into three types: OPERATION-OPERATION, OP-ERATION-MATERIAL and other relations. In the following, the starting point of a relation is called head and the ending point is called tail, and an edge is denoted as HEAD-TAIL.

B.1 OPERATION-OPERATION
The relation OPERATION-OPERATION takes only a NEXT OPERATION label, which means the progress of operation. NEXT OPERATION: Close OPERATION entities are linked with the relation from the beginning to the end in the document order, in which the entities of OPERATION appear. For SOLVENT MATERIAL, ATMO-SPHERIC MATERIAL and PARTICI-PANT MATERIAL labels, a dictionary is prepared manually for each label. The relations are linked from the nearest OPERATION to a MATERIAL in the sentence if the MATERIAL match in the dictionary since these relations take specific MATERIAL entities. The dictionary is included in the source code.

B.2 OPERATION-MATERIAL
RECIPE PRECURSOR is linked from all MATE-RIAL that do not match the dictionary of SOL-VENT MATERIAL, ATMOSPHERIC MATERIAL, and PARTICIPANT MATERIAL to the nearest OP-ERATION. This rule-based model does not produce the relation RECIPE TARGET. The reason for these   decisions is that it is difficult to classify these relations with simple rules.

B.3 Remaining Relations
The remaining 9 relation labels are defined between the other pairs of entity labels: PROP-ERTY OF, which indicates a condition of a material; CONDITION OF, which indicates a condition of an operation; NUMBER OF, which indicates the relationship between a number and a unit; AMOUNT OF, which indicates a condition of a quantity; TYPE OF, which indicates a condition of a numerical condition; BRAND OF, which indicates the brand of a material or equipment; APPA-RATUS OF, which indicates equipment used in an operation; APPARATUS ATTR OF, which indicates a numerical condition of on equipment; and DE-SCRIPTOR OF, which indicates other conditions. For these labels, the rules are defined based only on the labels of head and tail entities and the distance between them. We explain the detailed rules in the remainder of this section. PROPERTY OF: The relation can take PROPERTY-UNIT or PROPERTY-MISC as the head and MATERIAL or NONRECIPE-MATERIAL as the tail. When PROPERTY-UNIT is a head, it is linked with the nearest MATERIAL in the sentence. When PROPERTY-MISC is a head, it is linked to the nearest MATERIAL or NONRECIPE-MATERIAL in the sentence.
CONDITION OF: CONDITION-UNIT and CONDITION-MISC are linked to the nearest OPERATION with the relation in the sentence.
NUMBER OF: NUMBER is linked to the nearest PROPERTY-UNIT, CONDITION-UNIT, or APPARATUS-UNIT that appear after the NUMBER in the sentence.
AMOUNT OF: The relation is linked from AMOUNT-UNIT and AMOUNT-UNIT to the nearest MATERIAL or NONRECIPE-MATERIAL in the sentence.
DESCRIPTOR OF: When MATERIAL-DESCRIPTOR is a head, it is linked to the nearest MATERIAL or NONRECIPE-MATERIAL in the sentence. When APPARATUS-DESCRIPTOR is a head, it is linked to the nearest SYNTHESIS-APPARATUS in the sentence.
APPARATUS OF: The relation is linked from SYNTHESIS-APPARATUS and CHARACTERIZATION-APPARATUS to the nearest OPERATION with the priority given to the OPER-ATION that appear before the APPARATUS in the sentence.
TYPE OF: PROPERTY-TYPE and APPARATUS-PROPERTY-TYPE are linked to the nearest PROPERTY-UNIT and APPARATUS-UNIT in the sentence with the relation, respectively. When CONDITION-TYPE is a head, it is linked to the nearest CONDITION-UNIT that appears before the CONDITION-TYPE in the sentence.
BRAND OF: The relation is linked from BRAND to the nearest entities that may have brands (i.e., MATERIAL, NONRECIPE-MATERIAL, SYNTHESIS-APPARATUS, and CHARACTERIZATION-APPARATUS) in the sentence.     We defined the search space as shown in Table 5; the hyper-parameters for the search are composed of the learning rate for Adam, the number of GCN layers, the maximum edit distance d max , the dimensions of all hidden layers, the number of FC out layers, the number of FC h and FC t layers, the dropout rate, the dimension of e old ij , the maximum distance and the dimension for b ij and whether to use bidirectional GCNs or uni-directional GCNs. In the table, the range column shows the range of values to search and the final value column shows the rounded selected values after the optimization.

D Detailed Evaluation Results
Our editing models for evaluation are trained with a TITAN V GPU for EDIT with RULE and a Tesla V100 GPU for the others. The training takes about 6 hours 30 minutes with EDIT-IE using RULE and 21 hours with EDIT not using RULE.
We show the detailed evaluation results with precision (Prec.), recall, and F-score on the test set in Table 6 for EDIT without RULE, Table 7 for RULE,  and Table 8 for EDIT-IE without RULE. The results show the relations that are not covered by RULE, i.e., RECIPE TARGET and COREF OF, are extracted by our approach, and for these classes, EDIT without RULE show the better performance than the models with RULE. Some relations with high performance by RULE, including NEXT OPERATION, CONDITION OF, and DESCRIPTOR OF, are extracted by EDIT-IE with RULE in high performance. This shows our approach can effectively utilize the outputs of RULE.