StereoRel: Relational Triple Extraction from a Stereoscopic Perspective

Relational triple extraction is critical to understanding massive text corpora and constructing large-scale knowledge graph, which has attracted increasing research interest. However, existing studies still face some challenging issues, including information loss, error propagation and ignoring the interaction between entity and relation. To intuitively explore the above issues and address them, in this paper, we provide a revealing insight into relational triple extraction from a stereoscopic perspective, which rationalizes the occurrence of these issues and exposes the shortcomings of existing methods. Further, a novel model is proposed for relational triple extraction, which maps relational triples to a three-dimension (3-D) space and leverages three decoders to extract them, aimed at simultaneously handling the above issues. A series of experiments are conducted on five public datasets, demonstrating that the proposed model outperforms the recent advanced baselines.


Introduction
Relational triple is a common structural representation of semantic facts. A triple is always in form of (subject, relation, object), where subject and object are two entities connected by a type of predefined semantic relation. Relational triple extraction from unstructured texts is critical to understanding massive text corpora and constructing largescale knowledge graph (Ren et al., 2017;, which is widely concerned in recent years. Early researches (Zhou et al., 2005;Chan and Roth, 2011;Zhang et al., 2017) first recognize entities and predict the relations for each entity pair. Such approaches suffer from error propagation problem and thus recent researches (Zheng et al., 2017;Zeng et al., 2018;Fu et al., 2019; • Information loss (I-IL). Information loss includes entity incompleteness  and entity overlapping (Zeng et al., 2018;. Entity incompleteness (I-IL-EI) refers to that only head or tail token rather than completed entity is recognized, while entity overlapping (I-IL-EO) is that one entity belonging to multiple triples cannot be marked.
• Error propagation (I-EP). Error propagation comes from the prediction process with strict order. For examples, pipeline models (Zhang et al., 2017;Takanobu et al., 2019) recognize entities first and predict relations based on each specific entity pair. Generative models (Zeng et al., 2018(Zeng et al., , 2019 extract subject, object and relation with a predetermined order. • Ignoring the interaction between entity and relation (I-II). Subjects (or objects) in different predefined relations should have different recognition patterns, which are not modelled when ignoring the interaction between entity and relation.
To intuitively explore the above issues and address them, from a stereoscopic perspective, we map the relational triples of a text to a threedimensional (3-D) space, which is like a cube as Figure 1. The relational triples are actually some small cubes in the whole cube. Existing researches are actually to model the cube from different perspectives and further extract the triples. Based on the representation of triples in 3-D space, three operations (i.e. slice, projection and shrinkage) are defined as Figure 2, to understand why existing methods suffer from the above issues. Furthermore, we propose a novel model for relational triple extraction, which can simultaneously handle the above issues, named StereoRel. More precisely, the cube is modelled from three perspectives, including (x, z)-plane projection, (y, z)-plane projection and z-slices, which indicates the subjects, objects and their correspondences for each predefined relation. Correspondingly, the proposed method leverages three decoders to extract relational triples in a unified model. This work has the following main contributions: • We provide a revealing insight into relational triple extraction from a stereoscopic perspective, where the occurrence of several challenging issues and shortcomings of existing methods are rationalized.
• We propose a novel StereoRel model for relational triple extraction, which can simultaneously reduce information loss, avoid error propagation and not ignore the interaction between entity and relation.
• Extensive experiments are conducted on five public datasets, demonstrating that the proposed model outperforms the recent advanced baselines.

Relational Triple Extraction from 3-D Perspective
In form of (subject, relation, object), triples can naturally be mapped to a three-dimensional (3-D) space, which is elaborated in this section. Meanwhile, we define three operations (i.e. slice, projection and shrinkage) in 3-D space, to make it easy to understand the strengths and shortcomings of previous researches.

Triple Representation in 3-D Space
Given a text L with length being |L| and a predefined relation set R having |R| relations, L may have several triples, that is, p([s, r, o]|L) where [·] represents a collection. Each triple consists of a subject (s), an object (o) and one relation (r) belonging to R. Subject is one entity, that is, ngram in L and so does object. To model p([s]|L) or p([o]|L), the common strategy is to leverage sequence tagging on L, which has some existing strategies, such as BMES tagging (Zhang and Yang, 2018; and start-and-end binary tagging Sui et al., 2020)    ering all possible connections, it should be equivalent to a cube with size being (|L| × |T |) 2 × |R| in a 3-D space. As shown in Figure 1, the line segments of the cube mapping on x-axis, y-axis and z-axis are respectively regarded as the representations of subjects, objects and relations, that is, p([s]|L), p([o]|L) and p([r]|L). Similarly, the rectangles of the cube mapping on (x, y)-plane, (x, z)-plane and (y, z)-plane are respectively regarded as p([s, o]|L), p([s, r]|L) and p([o, r]|L). Further, each triple is mapped to a small cube in the space. Based on the stereoscopic representation of relational triples, we define the following operations. Slice, denoted as sli(·). As shown in Figure 2(a), when some elements (i.e. subject, object or relation) are specified, the representation space will be reduced. The operation is like slicing the cube. For instances, a specific relation corresponds to a z-slice with size being (|L| × |T |) 2 × 1. Both subject and object being specified leads to an xyslice with size being (m × |T |) × (n × |T |) × |R|, which can be seen as the intersection of an x-slice and a y-slice. If subject, object or relation are operation I-IL I-EP I-II I-IL-EI I-IL-EO sli(·) + pro x/y (·) + pro xy (·) + + shr(·) only + all specified, there is an xyz-slice with size being (m × |T |) × (n × |T |) × 1, that is, a triple. Projection, denoted as pro(·). As depicted in Figure 2(b), two types of projection are defined, cubeto-plane and plane-to-axis. The former seems to look at the whole cube from a certain plane. For example, in the projection from cube to (x, y)-plane, two triples with the same subject and object are indistinguishable. Similarly, in the projection from cube to (x, z)-plane, there is only subject and relation information but no object information. The later seems to look at a plane from a certain axis, such as (x, z)-plane to x-axis projection, where the subjects in different z-slices may have the same representation on x-axis. Hereafter, for easy reading, (x, z)-plane to x-axis and (y, z)-plane to y-axis projections are denoted as pro x (·) and pro y (·) respectively. The projection from cube to (x, y)plane is denoted as pro xy (·). The rest ones are similar. Shrinkage, denoted as shr(·). In the cube representation, each token pair is represented by an xy-slice with size being |T | × |T | × |R|. Such an xy-slice can reflect all possible entity-tagging combinations of a token pair. As described in Figure 2(c), a shrinkage over a cube only represent whether the token pair satisfies one specific entitytagging combination, such as (start, start). Thus, the size of a shrinkage is |L| × |L| × |R|.

Analysis of Previous Researches
As aforementioned, relational triple extraction faces three challenging issues: information loss (I-IL-EI or I-IL-EO), error propagation (I-EP) and ignoring the interaction between entity and relation (I-II). It is clear to match them to the three operations in 3-D space as Table 1. sli(·) corresponds to the prediction process with strict order, and thus leads to error propagation. Without considering the nested entities in a text, pro xz/yz/z (·) do not result in any problems, while pro x/y/xy (·) is the indicates that a model can handle a certain issue and × is the opposite. opposite. Both pro x/y (·) and pro xy (·) lead to ignoring the interaction between entity and relation. Meanwhile, pro xy (·) makes the triples with overlapped entities indistinguishable. The cube can be disassembled into |T | × |T | shr(·). Modeling only one shr(·) will cause entity incompleteness. To get deep insights on relational triple extraction, based on the correspondence between the operations and issues, we analyze previous researches as shown in Table 2.

Models Perspective in 3-D space (for modeling p([s, r, o]|L)) I-IL I-EP I-II I-IL-EI I-IL-EO
Early researches (Zelenko et al., 2002;Zhou et al., 2005; Chan and Roth, 2011) adopt pipeline approaches, where the entities are recognized first and the relations for each entity pair are predicted. Arguing that such approaches neglect the inherent relevance between entity recognition and relation extraction, some solutions (Miwa and Bansal, 2016;Zhang et al., 2017;Takanobu et al., 2019) still extract entities and relations sequentially, but make two tasks share the same encoder. These methods model p ([s, r, o] . Therefore, pipeline paradigm suffers from I-EP and I-II issues. MHS (Bekoulis et al., 2018) is another two-stage method. The model recognizes entities firstly and extracts relational triples with a multi-head selection strategy on each subject, where pro x/y (·) and sli x (·) lead to I-EP and I-II issues respectively.
In the following researches on relational triple extraction, several methods with joint decoding schema are proposed. Specifically, NovelTagging (Zheng et al., 2017) and PA-Tagging (Dai et al., 2019) achieve joint decoding by designing a unified tagging scheme and convert relational triple extraction to an end-to-end sequence tagging problem. Such a tagging schema has to model p(pro xy ([s, r, o])|L) and thus suffers from I-IL-EO and I-II issues. CopyRE (Zeng et al., 2018) and CopyRRL (Zeng et al., 2019) leverage sequence-to-sequence model with copy mechanism. GraphRel (Fu et al., 2019) introduces graph convolutional network jointly learn entities and relations. Despite their initial success, the three methods only model p(shr ([s, r, o])|L) and thus suffer from I-IL-EI issue. Sequence generation models, CopyRE and CopyRRL, predict triples one by one and model p(sli xyz (s i , r i , o i )) via p(sli z (r i )) × p(sli xz (s i , r i )|sli z (r i )) × p(sli xyz (s i , r i , o i )|sli xz (s i , r i )), which leads to I-EP issue. GraphRel cannot avoid I-IL-EO and I-II issues due to its utilizing pro xy (·).
Recently, to address I-IL issue, CopyMTL  proposes a multi-task learning framework based on CopyRE, to simultaneously predict completed entities and capture relational triples. However, the model still does not solve I-EP issue. Meanwhile, entity recognition is implemented by modeling p([s] ∪ [o]|L) via a standalone module, which leads to I-II issue. Following sequence-to-sequence schema, WDec and PNDec (Nayak and Ng, 2020) design specific decoder block which can generate triples with completed entities. Such models ease I-II issue, but still suffers from I-EP issue since that it models p(sli xyz (s  regards relations as functions that map subjects to objects in a text. It is necessary to recognize subjects first and then objects, which leads to I-EP issue. To recognize subjects, p([s]|L) is modelled via p(pro x (pro xz ([s, r, o]))|L), where pro x (·) leads to I-II issue. Att-as-Rel

The Proposed StereoRel Model
To handle the above three issues simultaneously, we avoid to make the operations in Table 1 Figure 3, the proposed StereoRel model first leverages BERT encoder to extract the text representation for the original text. Then, for each predefined relation, the text representation is transformed to its subject and object spaces. Based on them, three decoders are built to separately model p(pro xz ([s, r, o])|L), p(pro yz ([s, r, o])|L) and p(shr([s, r, o])|L). The first two will cause no issue and provide complete entities for the last shr(·) operation.

BERT Encoder
To sufficiently capture the textual information, the encoder is built by a pre-trained language model, BERT (Devlin et al., 2019). BERT encoder tokenizes a text L using a predefined vocabulary and generates a corresponding sequenceĽ by concatenating a [CLS] token, the tokenized text and a [SEP] token. The detailed steps can be referred to (Devlin et al., 2019). BERT encoder will embed a text L into a matrix T ∈ R (|L|+2)×d b , where d b is the hidden size of BERT, and T j can be seen as the word embedding of j-th token, j ∈ [0, |L| + 1]. After this, for each relation r i , T is transformed to a new text representation T i ∈ R (|L|+2)×dr by: where ∈ R 1×dr are trainable parameters and φ(·) is predetermined activation function.

Subject Decoder
Subject decoder is to model |R| i p([s, r i ]|L), that is, (x, z)-plane projection p(pro xz ([s, r, o])|L), which recognizes the subjects for each predefined relation. For one specific relation r i , we transform its text representation to T sub i ∈ R (|L|+2)×de in r i 's subject space with d e being the hidden size. The transformation is implemented by are trainable parameters and φ(·) is predetermined activation function. Based on T sub i , all possible subjects in relation r i 's subject space are recognized by a sequential conditional random field (CRF) (Lafferty et al., 2001) layer with [Begin, Inside, Outside] tagging schema, where the probability of the final label sequence, y sub i = [y sub i1 , y sub i2 , ..., y sub i|L| ], is modeled as follows: where Y denotes all possible label sequence of L. w crf sub y,ŷ and b crf sub y,ŷ are trainable parameters corresponding to the label pair (y,ŷ).

Object Decoder
Object decoder is to model which recognizes the objects for each predefined relation. Similar to subject decoder, the text representation in object space, T obj i ∈ R |R|×(|L|+2)×de , is obtained in object decoder as: where are trainable parameters. In like wise, objects of the i-th predefined relation are tagged as y obj i = [y obj i1 , y obj i2 , ..., y obj i|L| ] via another CRF layer as: where w crf obj y,ŷ and b crf obj y,ŷ are trainable parameters.

Shrinkage Decoder
To extract the correspondences between subjects and objects, shrinkage decoder is leveraged to model |R| i p(shr([s, r i , o])|L), where each element of shr(·) denotes whether the corresponding token pair is one specific position of a (subject, object) pair, such as (start, start) or (end, end). To model this, a pair-wise classification function f is established as: indicating the probability that j-token and j -token is the specifc position of a (subject, object) pair, which satisfies the i-th predefined relation. We design the function as follows: where ψ(·) is implemented by dot product or neural network to provide an initial probability. p sub→obj ijj and p obj→sub ijj respectively indicate the probability distributions for a subject searching for its objects and an object searching for its subjects, which are integrated via a predetermined function ξ(·), such as minimum, maximum and multiplication.

Learning and Inference
Subject and object decoders are learned by textlevel log-likelihood loss, while shrinkage decoder is learned by token-level binary cross-entropy loss. Thus, the unified model is learned by a combined loss function L total = L sub + L obj + L shr , where The relational triples can be inferred based on the three decoders. Concretely, for each predefined relation r i , the subjects and objects can be obtained by y sub i and y obj i respectively. For the subject s ij and object o ij with (j-th token, j -th token) satisfying the specific position, if p shr ijj is greater than a predetermined threshold δ, (s ij , r i , o ij ) will be extracted as a relational triple.

Experiments
To evaluate the proposed StereoRel model, we conduct a performance comparison on five public datasets in this section.

Experimental Settings
Evaluation Metrics and Datasets. Generally, the performance on relational triple extraction is evaluated by precision (Pre.), recall (Rec.) and F1-score (F1) , where a triple is regarded as correct if subject, relation and object are all matched. Notably, in previous works, there are two evaluation modes: Partial Match and Exact Match. The former holds that subject (or object) is correct as long as its head or tail is correct, while the latter requires it to be recognized completely. To properly compare our model with various baselines, benchmark datasets are selected for the two modes separately. Concretely, we utilize NYT (Riedel et al., 2010), WebNLG (Gardent et al., 2017), NYT10 (Takanobu et al., 2019) and NYT11 (Takanobu et al., 2019) datasets for Partial Match, while NYT (Riedel et al., 2010) and Wiki-KBP (Dai et al., 2019) datasets for Exact Match. The details are shown in Table 3. The splits of validation set are the same as previous researches. Implementation Details. For making a fair comparison, we utilize the cased BERT-base 1 model in our experiments, which is the same as CasRel  and TPLinker (Wang et al., 2020b), and thus d b = 768. Adam optimizer (Kingma and Ba, 2015) is utilized to train the proposed method with initial learning rate being 1e-5. The hidden size d r , d e are set as 64, 32. The threshold δ is tuned for each relation and determined by the validation set. φ(·) is set as relu activation function. ψ(·) is set as dot product. ξ(·) is set as the multiplication function.

Performance Comparison
We employ some recent advanced methods as baselines, mainly including the models analyzed in Table 2. Table 4, 5, 6 and 7 report the results of our method against the baselines for Partial Match evaluation mode, and Table 8 and 9 report the results for Exact Match. The models before CasRel do not employ BERT encoder and the rest does.
As aforementioned in Table 2, existing models do not handle three challenging issues simultaneously, while our proposed StereoRel model does. Among the baselines, Att-as-Rel is the first work to extract triples for each predefined relation with no I-IL and I-EP issues, and thus achieves a huge performance improvement compared with previous methods. Based on BERT encoder, the performance on relational triple extraction has been further improved by CasRel and TPLinker. Due  to no I-EP issue, TPLinker outperforms CasRel. However, TPLinker still suffers from I-II issue. Our proposed StereoRel model further considers it and achieves a better performance. From the results, comparing with the second best baseline, the performance improvement of the existing best baseline on the five datasets were 2.5%, 0.1%, 1.4%, 0.1% and 1.5% respectively, in terms of F1-score. Our model obtains performance gain about 0.3%, 0.2%, 0.2%, 0.7% and 0.6% in terms of the best baseline. It can be seen that the improvement is satisfied.  triple extraction into set prediction problem learned by bipartite matching loss. These ideas may be introduced in the future.
The second one is to recognize nested entities in relational triples. Nested entities are the entities among which there are substring relationships, like "U.N." being a substring of "U.N. Ambassador". Such entities definitely affect the overall performance on Exact Match mode. Take NYT dataset as an example, there are about 2.5% sentences containing nested entities. Nested entity recognition has been widely studied Wang et al., 2020a), but most studies on relational triple extraction have not considered it. TPLinker (Wang et al., 2020b) provides a solution to recognize nested entities via a token pair tagging, but it ignores the interaction between entity and relation. For StereoRel model, although not focusing on nested entities, it has not been much affected. The reason is that StereoRel recognizes subjects and objects for each predefined relation separately. In this case, only 0.06% of nested entities in NYT cannot be marked. Anyway, modeling nested entities from the stereoscopic perspective is worth exploring in the future.

Conclusions
Relational triple extraction is critical to understanding massive text corpora. However, existing studies face some challenging issues, including information loss, error propagation and ignoring the interaction between entity and relation. In this paper, aiming at simultaneously handling the above issues, we provide a revealing insight into relational triple extraction from a stereoscopic perspective, which rationalizes the occurrence of these issues and exposes the shortcomings of existing methods. Further, we propose a novel model leveraging three decoders to respectively extract subjects, objects and their correspondences for each predefined relation. Extensive experiments are conducted on five public datasets, demonstrating that the proposed model outperforms the recent advanced baselines.