UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction

Relational triple extraction is challenging for its difficulty in capturing rich correlations between entities and relations. Existing works suffer from 1) heterogeneous representations of entities and relations, and 2) heterogeneous modeling of entity-entity interactions and entity-relation interactions. Therefore, the rich correlations are not fully exploited by existing works. In this paper, we propose UniRel to address these challenges. Specifically, we unify the representations of entities and relations by jointly encoding them within a concatenated natural language sequence, and unify the modeling of interactions with a proposed Interaction Map, which is built upon the off-the-shelf self-attention mechanism within any Transformer block. With comprehensive experiments on two popular relational triple extraction datasets, we demonstrate that UniRel is more effective and computationally efficient. The source code is available at https://github.com/wtangdev/UniRel.


Introduction
Relational Triple Extraction (RTE) aims to identify entities and their semantic relations jointly.It extracts structured triples in the form of <subjectrelation-object> from raw texts in an end-to-end manner, and is a crucial task towards automatically constructing large-scale knowledge bases (Nayak et al., 2021).
Early works try to solve RTE tasks in a pipelined fashion involving two sub-tasks (Zelenko et al., 2003;Chan and Roth, 2011), where entities are firstly recognized, and relations are then assigned for each extracted entity pair.Therefore, such methods fail to capture the implicit correlation between these two isolated sub-tasks and are thus prone to propagated errors (Li and Ji, 2014).

Interaction Discrimination
Entity-Entity E n ti ty -R e la ti o n E nt ity -R el at io n to jointly extract <s-r-o> triples in an end-to-end manner.For example, Wei et al. (2020) propose a cascaded network that identifies subjects first and then recognizes corresponding objects for relations.Zheng et al. (2021a) decompose RTE into three subtasks.Wang et al. (2020) and Shang et al. (2022) extract relational triples in one stage to eliminate exposure bias.In general, these works accomplish end-to-end extraction by various ways to factorize and re-assemble the label space of relational triples.However, very often, there exist rich informative correlations between entities and relations, and these correlations can hardly be captured by the superficial constraints applied in label space only.Specifically, existing works fall short of two aspects: 1) heterogeneous representations between entities and relations; 2) ignorance of interactive dependencies between entity-level interactions and relation-level interactions.
For representations, existing works mainly focus on how to better capture the contextual information of entities, while ignoring the equally important semantic meaning of relations.Generally, rela-tions are simply represented as atomic label ids indicating a specific dimension in the newly initialized classifier (Wei et al., 2020;Wang et al., 2020;Zheng et al., 2021a), resulting in its heterogeneity with language model augmented entity representations.Such heterogeneity prevents models from capturing the intrinsic correlations between entities and relations in the semantic space.For instance, from the semantic meaning of the relation is_capital_of, we can infer that triples involving are related to locations: the subject is supposed to be a city, while the object should be a country.We argue that it is important to build unified representations for both entities and relations.
For interactions, we exemplify the interdependence between the entity-entity interactions and entity-relation interactions in Figure 1.We can easily determine that London and UK are correlated in the prerequisite of given the interactions of (London-is_capital_of) and (UK-is_capital_of).However, existing works either enumerate all entity-relation-entity triples (Wang et al., 2020;Shang et al., 2022), which suffer from huge prediction space, or model them with separate modules (Wei et al., 2020;Zheng et al., 2021a;Li et al., 2021) respectively for named entity recognition (NER) and relation extraction (RE), resulting in neglect of interdependencies.In general, the absence of unified modeling of interactions limits these methods from fully utilizing their interdependencies for better extraction.
In this paper, we propose UniRel with Unified Representation and Interaction to resolve both the heterogeneity of representations and the absence of interaction dependencies.We first encode both relation and entity into meaningful sequence embeddings to construct unified representations.Based on the semantic definition, candidate relations are first converted to natural language texts, and form a consecutive sequence together with the input sentence.We then apply a Transformer-based Pre-trained Language Model (PLM) to encode the sequence while intrinsically capturing their informative correlations.We then propose a novel solution for Unified Interaction, where we simultaneously model the entity-entity interactions and entity-relation interactions in one single Interaction Map by leveraging the off-the-shelf self-attention mechanism inside Transformer.Besides, benefiting from the design of Interaction Map, UniRel preserves the advantages for end-to-end extraction and is superior in computational efficiency.
We conduct comprehensive experiments on two popular datasets: NYT (Riedel et al., 2010) and WebNLG (Gardent et al., 2017), and achieve a new state-of-the-art.We summarize our contributions as follows: • We propose unified representations of entities and relations by jointly encoding them within a concatenated natural language sequence, which fully exploits their contextualized correlations and leverages the semantic knowledge learned from LM pre-training.
• We propose unified interactions to capture the interdependencies between entity-entity interactions and entity-relation interactions.This is innovatively achieved by the proposed Interaction Map built upon the off-the-shelf selfattention mechanism within any Transformer block.
• We show that UniRel achieves the new stateof-the-art for RTE tasks, while preserving the superiority of being end-to-end and computationally efficient.

Related Work
Early works (Zelenko et al., 2003;Chan and Roth, 2011) apply pipeline approaches that divide relational triple extraction into two isolated sub-tasks: first do named entity recognition to extract all entities, and then apply relation extraction to identify relations for each entity pair.However, these methods always suffer from error propagation for failing to capture the implicit correlation between these two isolated sub-tasks.
To tackle these issues, recent researches focus on jointly extracting entities and relations.Previous feature-based joint models (Yu and Lam, 2010;Miwa and Sasaki, 2014;Li and Ji, 2014;Ren et al., 2017) require complex feature engineering and heavily depend on NLP tools.Researchers propose neural network-based joint models to eliminate hand-craft features.Miwa and Bansal (2016) propose a model to jointly learn entities and relations through parameter sharing.Zheng et al. (2017) transform RTE into a sequence tagging problem, which unifies the annotation role of entities and relations.
Despite their success, most models cannot deal with complex scenarios where one sentence con-sists of multiple overlapping relational triples sharing a single entity (SingleEntityOverlap, SEO) or an entity pair (EntityPairOverlap, EPO).To handle the problem, researchers (Zeng et al., 2018(Zeng et al., , 2019;;Nayak and Ng, 2020;Ye et al., 2021) propose generative models that view triple as a token sequence.Some works (Wang et al., 2020;Ren et al., 2021;Shang et al., 2022) introduce methods to extract triples in one-stage but suffer from huge prediction space.Other researchers (Wei et al., 2020;Yuan et al., 2020;Zheng et al., 2021b;Li et al., 2021;Wu and Shi, 2021) decompose RTE into different subtasks, but learn the interaction between sub-tasks only by input sharing, or falling into the cascade error.PFN (Yan et al., 2021) proposes a partition filter network to fuse the task representation of NER and RE, but still models entity-entity interactions and entity-relation interactions with separate modules.In this work, UniRel unifies the modeling of the two kinds of interactions in one single Interaction Map to fully capture their interdependencies and is superior in computational efficiency.
More recently, Xu et al. (2022) propose EmRel that explicitly introduce relation representation to leverage the rich interactions across relations, entities, and context.However, it still suffers from heterogeneity between entities and the newly initialized embeddings of relations.Some approaches (Han et al., 2021;Chen and Li, 2021;Chen et al., 2022) introduce prompt-tuning to extract relation with semantic information.They transform the relation extraction task into a masked language modeling problem.However, such methods focus on the simple scenario of sentence-level relation classification without capturing the correlations between entities and relations.In terms of technical designs, SSAN (Xu et al., 2021) also delves into the selfattention layer within the Transformer to model structural interaction, but it makes extra adaptions for standard self-attention mechanism and focuses on document-level RE tasks.Compared to these approaches, our work aims to extract relational triples in complex scenarios where rich intrinsic correlations exist between entities and relations.In this paper, we unify the representations and interactions to fully exploit the correlations to extract entities and relations jointly.

Methodology
In this section, we present our model in detail.We first introduce the problem formulation in Section 3.1.Then, we introduce the Unified Representation and the Unified Interaction in Section 3.2 and Section 3.3, respectively.Finally, we present the details of training and decoding in Section 3.4.

Problem Formulation
Given a sentence X = {x 1 , x 2 , • • • , x N } with N tokens, the goal of joint relational triple extraction is to identify all possible triples T = [(s l , r l , o l )] L l=1 from X, where s l , r l , p l represent the subject, the object, and their relation, respectively, and L is the number of triples.The subject and object are entity mentions Note that entities and relations might be shared among triples.

Unified Representation
We first convert relations in the schema to natural language texts to represent them in the same form as the input sentence.For clarity, the conversion is performed through a verbalizer with human-picked words.The relation word is basically the most informative word within the label name that preserves its semantics, for example, "founders" for relation "/business/company/founders".We then input the concatenation of the input sentence and the natural language texts of relations to a Transformerbased PLM encoder.We use BERT (Devlin et al., 2019) as the PLM in this work.The inputs are then transferred to a sequence of input embeddings by searching the corresponding ids from the embedding table. (1) where H ∈ R (N +M )×d h is the input embedding vector.d h is the embedding size.T s and T p are the input ids of the input sentence and the relations, respectively.E is the embedding table in BERT.
After obtaining the input embeddings, the encoder captures the correlations between each input word with the self-attention mechanism.To be specific, Transformer-based PLMs comprise stacked Transformer (Vaswani et al., 2017) layers consisting of multiple attention heads.Each head applies three separate linear transformations to transform the input embeddings H to query, key, and value vectors Q, K, V , and then computes the attention  weights between all pairs of words by Softmaxnormalized dot production of Q and K, which then is fused with V as follows: Each Transformer layer generates token embeddings from the previous layer's output with the self-attention mechanism.We denote the H i as the output of the i-the Transformer layer.As we take both entities and relations into the input embedding H, the H i , encoded by such a deep Transformer network, fully captures the rich intrinsic correlations between entities and relations.The above steps make the representations of entities and relations unified into one semantic embedding vector H i with rich correlational information.

Unified Interaction
The interactions between the triple elements, Entity-Entity Interaction and Entity-Relation Interaction, can be directly used to extract relational triples.As shown in Figure 2, Triple (London, is_capital_of, UK) can be determined if know-ing the interactions between (London-UK), (London-is_capital_of), and (UK-is_capital_of).Motivated by this, we design an Interaction Map to model the two kinds of interactions simultaneously.

Entity-Entity Interaction
Entity-entity interaction is defined for identifying entity pairs that can be used to form valid relational triples.Given two entities e a and e b from sentence X, we regard entity pair (e a , e b ) are interacted only when there exists relation r that can be formed as valid triples together with them, and both (e a , r, e b ) and (e b , r, e a ) are allowed.For example, in Figure 2, (Holmes -London), as well as (London -Holmes), are supposed to be interacted for the existing triple (Holmes, lives in, London), while unrelated for (Holmes -capital) since no valid triple consists of them.Formally, we define the entity-entity interaction indicator function I e (•) as follows:

Entity-Relation Interaction
Entity-relation interaction recognizes correlated entities for each relation.Given a relation r, we regard entity e as interacting with r when existing triples consisting of e as either subject or object and r as the relation.As the relation is directional, we define entity-relation interaction asymmetrically to distinguish subject entities and object entities, as shown in the upper right part (Subject-Relation) and lower left part (Relation-Object) of the map in Figure 2, respectively.For instance, the interaction value of (London -is_capital_of) is supposed to be True because of the valid triple (London, is_capital_of, UK), while False for (UK -is_capital_of) since it is impossible for UK to be the subject of the relation is_capital_of.
We formally define the indicator function I r (•) of entity-relation interaction as follows: where I r (e, r) and I r (r, e) are defined for identifying entity e as subject and object for relation r, respectively.

Interaction Discrimination
Transformer layers bring powerful deep correlation capturing ability to BERT.Therefore, As shown in Figure 2, we comprise the two kinds of interactions into one single Interaction Map, which is in the same form as the attention map computed by the Transformer layer.Then we directly take the last Transformer layer of BERT for Interaction Discrimination.As the Interaction Map is not restricted to a normalized matrix, after obtaining Q, K from H 11 , the embeddings generated by the last layer of BERT, we average the dot production of Q and K of all heads and directly apply the sigmoid function to obtain the results.The detailed operations are as follows: where I ∈ R (N +M )(N +M ) is the interaction matrix corresponding to the Interaction Map.T is the number of heads.W Q t and W K t are trainable weights.We consider I(•) valid when the value of I(•) exceeds threshold σ.
UniRel captures the interactive dependencies as the entity-entity interactions and entity-relation interactions are simultaneously modeled in one single Interaction Map.Besides, with Unified Interaction, the prediction space is narrowed down to O((N + M ) 2 ), which is much smaller than the most recent work, OneRel, which predicts triples in the complexity of O(N × M × N ).

Training and Decoding
The binary cross entropy loss is used for training: where I * is the ground truth matrix of the Interaction Map.
For decoding, we first recognize all valid subject entities and object entities for each relation from entity-relation interactions I r (the green box and blue box in the lower right part of Figure 2).Then we enumerate all candidate entity pairs for each relation pruned by entity-entity interactions I e (the red box in the lower right part of Figure 2).
Such a decoding method can address the complex scenarios with overlapping patterns containing EntityPairOverlap (EPO) and SingleEntityOverlap (SEO) for all arrangements of the entities and relations are taken into account.As shown in Figure 2, the extracted triples contain SEO triples: (Holmes, lives_in, UK) and (Holmes, lives_in, London), and the EPO triples: (UK, contains, London) and (London, is_capital_of, UK).
Table 1 shows the results of our model against other baseline methods on all datasets.Many previous baselines achieve F1-score of over 90% on both datasets, especially on WebNLG, which already exceed human-level performance.UniRel achieves +0.7% and +0.4% improvements over F1scores on NYT and WebNLG and outperforms all the baselines in terms of all the evaluation metrics, which shows the superiority of our model.
To further study the ability to handle the overlapping problem and extracting multiple triples, following previous works (Wei et al., 2020;Wang et al., 2020;Zheng et al., 2021a;Shang et al., 2022), we conduct further experiments on different types subsets of NYT.
As shown in lapping situations, our model achieves the highest performance, and the results are robust, which demonstrates the advantage of UniRel in processing overlapping triples.It can be seen that our model also makes improvements for almost all kinds of sentences regarding the number of triples.
From simple situation (L = 1) to complex case (L = 3), UniRel still brings improvements, which shows the robustness of our model.In general, this experiment shows the power of our model in complex scenarios.We attribute the effectiveness to the captured rich interactions between entities and relations by the introduced Interaction Map, which is essential for solving the complex overlapping triple problem.

Ablation Study
In this section, we conduct ablation experiments on NYT and WebNLG datasets to study the effectiveness of the proposed Unified Representation and Unified Interaction as reported in Table 1.

Effect of the Unified Representation
To study the effectiveness of Unified Representation, instead of assigning meaningful words for each relation, we use the placeholder [unused] of BERT to represent relations as marked as UniRel unused in Table 1.Same as meaningless label ids, the embeddings of [unused] tokens are randomly initialized at the fine-tuning stage without augmenting the meaningful semantic information learned from pre-training.We can see a performance decay in terms of all the evaluation metrics on both datasets without Unified Representation, which indicates the importance of the semantic information for relational triple extraction.We also notice the significant performance decrease of UniRel unused on the WebNLG dataset.We think it is because the WebNLG dataset has much fewer training data but defines far more relations compared to the NYT dataset.Such a contradiction makes many relations have few samples for training in the WebNLG dataset.And it is hard for a model to learn the deep semantic information with few examples from zero.
To validate our assumption, we further analyze the performance of UniRel and UniRel unused on relations with different orders of magnitude samples in training data on the WebNLG dataset.As shown in Figure 3, UniRel unused performs well on relations with much samples (≥ 1000), but is limited when the number of samples decreases, which confirms our assumption.In contrast, UniRel maintains a good performance level in the face of different numbers of samples.Especially with extremely few samples (≤ 10), UniRel performs at the same level as much samples (≥ 1000) , which further demonstrates the effectiveness of Unified Representation.

Effect of the Unified Interaction
To study the influence of Unified Interaction, we take the relation sequence out of the input sentence to model the two kinds of interactions in a separate manner, and denoted as UniRel separate in Table 1.Specifically, we first obtain the sequence embeddings of the input sentence and the natural language texts of relations severally with the same BERT encoder.Then we apply two transform layers to get the query and key of the concatenation of     1, UniRel separate has marked performance degradation on both datasets compared to UniRel, which demonstrates the unified interaction's effectiveness.We further analyze the performances of UniRel and UniRel separate on different interaction types.As shown in Figure 4, not only entityrelation, the F1-score of entity-entity also decreases without simultaneously modeling the interactions, which proves the interdependencies between the two kinds of interactions, and UniRel takes benefits from modeling them in a unified way.set the batch size to 6/1 for training/inference.All the compared models are tested in the same hardware environment as declared in Section 4.1.

Computational Efficiency
UniRel shows a conspicuous computational efficiency performance in both training and inference time.Specifically, Compared to the SOTA model OneRel, on the NYT dataset, UniRel obtains 3× and 1.7× faster in the stage of training and inference, respectively.We think the reason is that UniRel (O(N + M ) 2 ) has a smaller prediction space than OneRel (O(N × M × N )).Although CasRel performs similarly to ours regarding training time, UniRel obtains more than 2.6× speedups in the inference time.We attribute the efficiency to the design of the Interaction Map, which allows UniRel to narrow down the prediction space and directly leverage the off-the-shelf self-attention mechanism within the Transformer block.

Visualization
We visualize the Interaction Map to see how it works for relational triple extraction.As shown in Figure 5, the red box represents the entity-entity interaction.The blue box and green box represent the entity-relation interaction for the subject and the object, respectively.From the map, we can extract all six relational triples: (Yunnan, country, China), (China, administrative_divisions, Yunnan), (Thailand, contains, Chiang Mai), (Yunan, Contains, Jinghong), (China, Contains, Jinghong), and (China, Contains, Yunnan).

Conclusion
In this work, we propose UniRel to fully leverage the rich correlations between entities and relations by resolving the heterogeneity.Unified Representation eliminates the representation's heterogeneity by encoding both entity and relation into meaningful sequence embeddings.Unified Interaction eliminates the interaction's heterogeneity by simultaneously modeling entity-entity interactions and entity-relation interactions in one single Interaction Map.UniRel produces significant improvements over competitive baselines.We give a comprehensive analysis to further justify our design.

Limitations
There are two limitations we want to discuss in this section: • First, for clarity, we select the corresponding words for each relation in a manual way, which would be sophisticated for schema with much relations.We will next to try to design an auto-verbalizer for relations.
• Second, the evaluation benchmarks (NYT and WebNLG) for joint relational triple extraction are produced with much annotated training data, which is expensive for real application.The performance of our model in lowresource scenario needs to be validated.However, existing benchmarks for low-resource (Han et al., 2018;Gao et al., 2019) are limited to the simple scenario of sentence-level relation classification.We would like to explore the the idea of unified representation and unified interaction towards joint RTE in low resource scenario in the future.

A.2 Influence of Relation Numbers
To study how the number of relations M influences the performance of UniRel, as shown in Figure 6, we conduct ablation experiments with different M .We control the number of sentences of each relation in the range of [1500,1700) to keep each relation has similar signals in each experiment.
From Figure 6, we can observe that UniRel has relatively stable improvements when the number of relations increases.For example, UniRel achieves both +5.6% improvements when M = 3 and M = 9.The results show the effectiveness of UniRel is relatively stable with different numbers of relations.

A.3 Relation-Word Mapping
For clarity, we convert relations to natural language texts with human-picked words.The picked word is basically the most informative word within the label name that preserves its semantics, for example, "founders" for "/business/company/founders".We don't exploit any special selection strategy and need no formal annotation.For exceptional cases where the relation's labels are very similar, we simply resort to alternative words ("part" and "section") or capitalization ("part" and "Part") to make a distinction.This provides a semantic-aware initialization and the model will continue to optimize them.As a result, the performance of UniRel is very robust to choices of words.As shown in Table 7, we conduct experiments with two different relation-word mappings of the NYT dataset, and the results show UniRel is robust to the mapping words.

A.4 Extended to Multi-token Entity Setting
With minimum adaptation, we can extend UniRel to the multi-token entity setting.The adaption processes are as follows: 1) repeat Interaction Map twice respectively for head and tail token of entity span to identify (subject-head, relation, objecthead) and (subject-tail, relation, object-tail).2) add a third Interaction Map between head-tail tokens to identify (head, relation, tail).3) decode triples by linking the head and tail tokens of entities w.r.t. each relation.
As shown in

Figure 1 :
Figure 1: We leverage semantic information to unify the representation of entities and relations.Relational triples are extracted by modeling the entity-entity interaction (blue dashed line) and entity-relation interaction (red solid line) in a unified way.

Figure 2 :
Figure2: We input the concatenation of the input sentence and the natural language texts of relations (in bold).The Interaction Map is learned from the attention map inside the 12th layer of BERT Encoder, which consists of Entity-Entity Interaction (red rectangle) and Entity-Relation Interaction (green rectangle for subject and blue rectangle for object).Relational triples are extracted intuitively from the map.
e a , e b ) =      True (e a , r, e b ) ∈ T or (e b , r, e a ) ∈ T, ∃r ∈ R False otherwise , (4) I e (e b , e a ) = I e (e a , e b ), as entity-entity interaction is symmetrical.

Figure 3 :
Figure 3: F1-score on relations with different orders of magnitude samples in training set for UniRel (in Orange) and UniRel unused (in Blue).10/100/1000 means relations with less than or equal to 10/100/1000 samples.

Figure 4 :
Figure 4: F1-score on Entity-Entity Interaction and Entity-Relation Interaction for UniRel (in Orange) and UniRel separate (in Blue).

Figure 5 :
Figure 5: Visualization of Interaction Map with input sentence sampled from NYT. Relations are in bold.
, which has 171 predefined relations.The statistics of the datasets are shown in Table2.We evaluate our method on standard data splitting and report the standard

Table 1 :
Main results.The highest scores are in bold.

Table 2 :
Statistics of evaluation datasets.Overlapping patterns are counted on test set.

Table 3 :
Table 3, the results indicate the effectiveness of our model in complex scenarios.Our model exceeds almost all the baselines in the Normal class and three overlapping patterns.Especially in SEO and EPO, the most common over-±0.395.3 ±0.2 95.2 ±0.1 89.8 ±3.6 91.5 ±0.3 94.3 ±0.2 94.5 ±0.3 96.6 ±0.2 94.2 ±0.8 F1-score on sentences with different overlapping patterns and different triple numbers.L is the number of triples in one sentence.All the compared models are implemented with BERT.We report the average results of UniRel of the five runs with different random seeds.The highest scores are in bold.
Table4shows the comparison results of the computational efficiency between UniRel and some recent high-performance models.We report training and inference time on both NYT and WebNLG datasets.In this experiment, we follow previous works and

Table 4 :
Computational Efficiency.Training time represents the time (second) needed to train one epoch.Inference time represents the average time (millisecond) to predict one sample.

Table 5 :
As shown in Table5, we conduct experiments with 5 different random seeds, and the results and their improvements are rather robust.We get an averaged performance of 93.62±0.07/94.52±0.12 on NYT/WebNLG with 5 runs in total, which robustly outperforms the previous SOTA methods.F1-score of UniRel with different random seeds.

Table 6 ,
UniRel achieves SOTA performance under the multi-token entity setting, which further indicates the effectiveness of the proposed methods.

Table 6 :
Results of the multi-token entity setting.The highest scores are in bold.