TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph

Multi-hop Question Answering (QA) is a challenging task because it requires precise reasoning with entity relations at every step towards the answer. The relations can be represented in terms of labels in knowledge graph (e.g., spouse) or text in text corpus (e.g., they have been married for 26 years). Existing models usually infer the answer by predicting the sequential relation path or aggregating the hidden graph features. The former is hard to optimize, and the latter lacks interpretability. In this paper, we propose TransferNet, an effective and transparent model for multi-hop QA, which supports both label and text relations in a unified framework. TransferNet jumps across entities at multiple steps. At each step, it attends to different parts of the question, computes activated scores for relations, and then transfer the previous entity scores along activated relations in a differentiable way. We carry out extensive experiments on three datasets and demonstrate that TransferNet surpasses the state-of-the-art models by a large margin. In particular, on MetaQA, it achieves 100% accuracy in 2-hop and 3-hop questions. By qualitative analysis, we show that TransferNet has transparent and interpretable intermediate results.


Introduction
Question answering (QA) plays a central role in artificial intelligence.It requires machines to understand the free-form questions and infer the answers by analyzing information from a large corpus (Rajpurkar et al., 2016;Joshi et al., 2017;Chen et al., 2017) or structured knowledge base (Bordes et al., 2015;Yih et al., 2015;Jiang et al., 2019).Along with the fast development of deep learning, especially the pretraining technology (Devlin et al., 2018;Lan et al., 2019), state-of-the-art models have been shown comparative with human per-<sub> and <obj> have been married for 26 years.
In 2000, Melinda Gates co-founded the <obj> with her husband <sub>.
In 2000, <sub> cofounded the <obj> with her husband Bill Gates.
During his career at <sub>, <obj> held the positions of chairman, chief executive officer (CEO), president and chief software architect.

Bill Gates Melinda Gates
Bill & Melinda Gates Foundation

Microsoft Corporation
Spouse Founder CEO Founder

Label Form Text Form
Bill & Melinda Gates Foundation Question: Answer: What Bill Gates found? the wife of organization did

Relation Graph
Figure 1: Answering a multi-hop question over the relation graph.The relations are constrained predicates in the label form (i.e., knowledge graph) while free texts in the text form.The reasoning process has been marked in the graph, where the correspondence between relations and question words has been highlighted in the same color.
formance on simple questions that only need a single hop (Petrochuk and Zettlemoyer, 2018;Zhang et al., 2020), e.g., Who is the CEO of Microsoft Corporation.However, multi-hop QA, which requires reasoning with the entity relations at multiple steps, is far from resolved (Yang et al., 2018;Dua et al., 2019;Zhang et al., 2017;Talmor and Berant, 2018).
In this paper, we focus on multi-hop QA based on relation graphs, which consists of entities and their relations.As shown in Figure 1, the relations can be represented by two forms: • Label form, also known as knowledge graph (e.g., Freebase (Bollacker et al., 2008), Wikidata (Vrandečić and Krötzsch, 2014)), whose relations are manually-defined constrained predicates (e.g., Spouse, CEO).
• Text form, whose relations are free texts retrieved from textual corpus.We can easily build the graph by extracting the co-occuring sentences of two entities.Since the label form arXiv:2104.07302v2 [cs.CL] 10 Oct 2021 is expensive and usually incomplete, the text form is more economical and practical.
In this paper, we aim to tackle multi-hop questions over these two different forms in a unified framework.
Existing methods for multi-hop QA have two main strands.The first is to predict the sequential relation path in a weakly supervised setting (Zhang et al., 2017;Qiu et al., 2020), that is, to learn the intermediate path only based on the final answer.These works suffer from the convergence issues due to the huge search space, which heavily hinders their performance.Besides, they are mostly proposed for the label form.So, it is not clear how to adapt them to the text form, whose search space is even much huger.The second strand is to collect evidences by using graph neural networks (Sun et al., 2018(Sun et al., , 2019)).They can handle both the two relation forms and achieve state-of-the-art performance.Although they prevail over the path-based models in performance, they are weak in interpretability since their intermediate reasoning process is black-box neural network layers.
In this paper, we propose a novel model for multi-hop QA, dubbed TransferNet, which has the following advantages: 1) Generality.It can deal with the label form, the text form, and their combinations in a unified framework.2) Effectiveness.TransferNet outperforms previous models significantly, achieving 100% accuracy of 2-hop and 3-hop questions in MetaQA dataset.3) Transparency.TransferNet is fully attention-based, so its intermediate steps can be easily visualized and understood by humans.
Specifically, TransferNet infers the answer by transfering entity scores along relation scores of multiple steps.It starts from the topic entity of the question and maintains an entity score vector, whose elements indicate the probability of an entity being activated.At each step, it attends to some question words (e.g., the wife of ) and compute scores for the relations in the graph.Relations relevant to the question words will have high scores (e.g., Spouse).We formulate these relation scores into an adjacent matrix, where each entry indicates the transfer probability of an entity pair.By multiplying the entity score vector with the relation score matrix, we can "hop" along relations in a differentiable manner.After repeating for multiple steps, we can finally arrive at the target entity.
We conduct experiments for the two forms respectively.
For the label form, we use MetaQA (Zhang et al., 2017), WebQSP (Yih et al., 2016) and CompWebQ (Talmor and Berant, 2018).TransferNet achieves 100% accuracy in the 2-hop and 3-hop questions of MetaQA.On WebQSP and CompWebQ, we also achieve a significant improvement over state-of-the-art models.For the text form, following (Sun et al., 2019), we construct the relation graph of MetaQA from the WikiMovies corpus (Miller et al., 2016).We demonstrate that TransferNet surpasses previous models by a large margin, especially for the 2-hop and 3-hop questions.When we mix the label form and the text form, TransferNet still keeps its superiority.Moreover, by visualizing the intermediate results, we show its strong interpretability.1

Related Work
In this paper we focus on multi-hop question answering over the graph structure that is either knowledge graph or built from text corpus.In previous works, GraftNet (Sun et al., 2018) and PullNet (Sun et al., 2019) have a similar setting to ours but they mostly aim at the mixed form, which includes both label relations and text relations.They first retrieve a question-specific subgraph and then use graph convolutional networks (Kipf and Welling, 2016) to implicitly infer the answer entity.These GCN-based methods are usually weak in interpretability because they cannot produce the intermediate reasoning path, which is necessary in our opinion for the task of multi-hop question answering.Besides, there are many works specifically for only one graph form: For the label form, which is also known as "KBQA" or "KGQA", existing methods fall into two categories: information retrieval (Miller et al., 2016;Xu et al., 2019;Zhao et al., 2019b;Saxena et al., 2020) and semantic parsing (Berant et al., 2013;Yih et al., 2015;Liang et al., 2017;Guo et al., 2018;Saha et al., 2019).The former retrieves answer from KG by learning representations of question and graph, while the latter queries answer by parsing the question into logical form.Among these methods, VRN (Zhang et al., 2017) and SRN (Qiu et al., 2020) have a good interpretability as they learn an explicit reasoning path with reinforcement learning.However, they suffer from the convergency issue due to the huge search space.IRN (Zhou et al., 2018) and ReifKB (Cohen et al., 2020) learn a soft distribution for intermediate relations and can be optimized using only the final answer.However, it is not clear how to extend them to the text form.
Question answering over text corpus is also known as "reading comprehension".For simple questions, whose answer can be retrieved directly from the text, pretrained models (Devlin et al., 2018;Lan et al., 2019) have performed better than humans (Zhang et al., 2020).For multi-hop questions that are much more challenging, existing works (Ding et al., 2019;Fang et al., 2019;Tu et al., 2020;Zhao et al., 2019a) usually convert the text into a rule-based or learning-based entity graph, and then use graph neural networks (Kipf and Welling, 2016) to perform implicit reasoning.Similar to PullNet, they are weak in interpretability.Besides, most of them build the graph by just connecting relevant entities, missing the important edge textual information.

Preliminary
We conduct multi-hop reasoning on a relation graph, which takes entities as nodes and relations between them as edges.The relations can be of different forms, specifically, constrained labels or free texts.The former is also known as structured Knowledge Graph (e.g., Wikidata (Vrandečić and Krötzsch, 2014)), which predefines a set of predicates to represent the entity relations.The latter can be easily extracted from large-scale document corpora according to the co-occurence of entity pairs.Figure 1 shows examples of these two forms.In this paper we call them label form and text form respectively, and use mixed form to denote a relation graph consisting of both labels and texts.
We denote a relation graph as G, its entities as E and its edges as R. Let n denote the number of entities, then R is an n × n matrix whose element r i,j represents the relations between the head entity e i and the tail entity e j .r i,j can be a set of labels (for label form) or texts (for text form) or both (for mixed form).A multi-hop question q usually starts from a topic entity e x and needs to traverse across relations to reach the answer entities Y = {e y 1 , • • • , e y |Y | }.

TransferNet
To infer the answer of a multi-hop question, Trans-ferNet starts from the topic entity and jumps for T steps.At each step, it attends to different parts of the question to determine the most proper relation.TransferNet maintains a score for each entity to denote their activated probabilities, which are initialized to 1 for the topic entity and 0 for the others.At each step, TransferNet computes a score for each relation to denote their activated probabilities in terms of the current query, and then transfer the entity scores across those activated relations.Figure 2 shows the framework.
Formally, we denote the entity scores of step t as a row vector a t ∈ [0, 1] n , where [0, 1] means a real number between 0 and 1. a 0 is the initial scores, i.e., only the topic entity e x gets 1.At step t, we attend to part of the question to get the query vector q t ∈ R d , where d is the hidden dimension. (1) q denotes the question embedding.f t is a projecting function of step t, which maps q to a specific query key qk t .qk t is the attention key to compute scores for each word based on their hidden vector h i .q t is the weighted sum of h i .
In terms of q t TransferNet computes the relation scores W t ∈ [0, 1] n×n : W t = g(q t ; θ g ).
(2) θ g denotes the learnable parameters.We will have different implementations of g for the label form and the text form, which will be introduced in Sec.3.5.Then we can simulate the "jumping across edges" as the following formulation: (3) Specifically, we have It means that the production of entity e i 's previous score and the edge r i,j 's current score will be collected into e j 's current score.Bill & Melinda Gates Foundation < l a t e x i t s h a 1 _ b a s e 6 4 = " z q f u J / y T r J U h x 4

W1
< l a t e x i t s h a 1 _ b a s e 6 4 = " g E j e l H 8 4 / + 3 S g q r 4 C + 4 s e A w P y J 8 Label Form

Text Form
Relation Graphs

Example
Question: What organization did the wife of Bill Gates found?
< l a t e x i t s h a 1 _ b a s e 6 4 = " < l a t e x i t s h a 1 _ b a s e 6 4 = " p W 8 h F 7 U p y 1 7 6 I p M S l p q g C R N w 5 In 2000, Melinda Gates cofounded the <obj> with her husband <sub>.
In 2000, <sub> co-founded the <obj> with her husband Bill Gates.
During his career at <sub>, <obj> held the positions of chairman, chief executive officer (CEO), president and chief software architect.
< l a t e x i t s h a 1 _ b a s e 6 4 = " z p 3 T u 8 R y N j 5 r V U B 1 R 9 I C e 0 A t 6 t R 6 t Z + v N e p 9 J c 1 Z W s 4 / + h P X 5 D S 8 3 o A c = < / l a t e x i t > 0 0.96 . . .0

STEP t
< l a t e x i t s h a 1 _ b a s e 6 4 = " W 4 N J c p 4 7 y A b Z Y D Z g L D g V y l 4 e e L M = " > A H N e r J e r H f r Y z 5 a s o q d Q / g D 6 / M H N 9 K S 8 w = = < / l a t e x i t > at Answer < l a t e x i t s h a 1 _ b a s e 6 4 = " z p 3 T u 8 R y N j 5 H U p x 1 x S l m j u A P n M 8 f r n W P L w = = < / l a t e x i t > • • • < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 A x L Y x j X q L B w O X V 9 V 7 a C I p d 0 0 d Q = " > A A A B 8 3 i c b V D L S s N A F L 2 p r 1 p f V Z d u B o v g q i S i q L u C G 5 c V + o K m l M n 0 p h 0 6 m Y S Z i V B C f 8 O N C 0 X c + j P u / B s n b R Z a P T B w O O d e 7 p k T J I J r 4 7 p f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 5 1 d J w q h m 0 W i 1 j 1 A q p R c I l t w 4 3 A X q K Q R o H A b j C 9 y / 3 u I y r N Y 9 k y s w Q H E R 1 L H n J G j Z V 8 P 6 J m E o Q Z n Q 9 b w 2 r N r b s L k L / E K 0 g N C j S H 1 U 9 / F L M 0 Q m m Y o F r 3 P T c x g 4 w q w 5 n A e c V P N S a U T e k Y + 5 Z K G q E e Z I v M c 3 J m l R E J Y 2 W f N G S h / t z I a K T 1 L A r s Z J 5 R r 3 q 5 + J / X T 0 1 4 M 8 i 4 T F K D k i 0 P h a k g J i Z 5 A W T E F T I j Z p Z Q p r j N S t i E K s q M r a l i S / B W v / y X d C 7 q 3 l X d f b i s N W 6 L O s p w A q d w D h 5 c Q w P u o Q l t Y J D A E 7 z A q 5 M 6 z 8 6 b 8 7 4 c L T n F z j H 8 g v P x D T r o k c c = < / l a t e x i t > aT Target Entity

STEP t-1 STEP T
< l a t e x i t s h a 1 _ b a s e 6 4 = " z p 3 T u 8 R y N j 5 H U p After repeating for T times, we get the entity scores of each step a 1 , a 2 , • • • , a T .Then we compute their weighted sum as the final output: where c ∈ [0, 1] T denotes the probability distribution of the question's hop, and c t is the probability value of hop t.We can answer all questions from 1-hop to T -hop by automatically determine its hop number.The entity with maximum score in a * is outputed as the answer.
TransferNet is a highly-transparent model.As shown in the example of Figure 2, we can easily track the model behaviour by visualizing the activated words, relations, and entities at each step (see Sec.5.4 for more examples).

Training
Given the golden answer set Y = {e y 1 , • • • , e y |Y | }, we construct the target score vector y ∈ {0, 1} n by Then we take the L2 Euclidean distance between a * and y as our training objective: Note that TransferNet is totally differentiable, therefore we can learn all of the intermediate scores (i.e., question attention, relation scores, and entity scores of each step) via this simple objective..

Additional Modules
We propose two modules to facilitate the learning of TransferNet.Score Truncation.According to Equation 4, a t j may exceed 1 after a transfer step.A too large score will have a bad influence to the gradient computation.Especially when the hop increases, it may lead to gradient explosion.Besides, our loss function, Equation 7, will fail if the final score has an unlimited value.So we need to rectify the entity scores after each transfer step, to ensure the value range is in [0, 1].At the same time, we need to maintain the differentiability of the operation.We propose such a truncation function: After each transfer step, we truncate a t by applying this function to each of its elements.Language Mask.TranferNet does not consider the language bias of the question, which may include some hints for its answer.For example, in the textformed relation graph we may have (Harry Potter, <sub> was published in <obj>, United Kingdom) and (Harry Potter, <sub> was published in <obj>, 1997).These two triples depict different aspects (i.e., the publication place and the publication time of Harry Potter) but with the same relation text.As a result, given the question Where was Harry Potter published, TransferNet will produce the same scores for United Kingdom and 1997, and thus use 1997 to wrongly answer the Where-question.
To solve this issue, we propose a language mask to incorporate the question hints.We predict a mask score for each entity using the question embedding: where m ∈ [0, 1] n , m i denotes the mask score of entity e i , MLP (short for multi-layer perceptron) projects d-dimensional feature to n-dimension.We multiply the mask to the final entity scores, where means element-wise multiplication.The a * in the objective function Equation 7should be replaced with â * .Note that we need the language mask only in the text form, because the predicates of label form have no ambiguity.

Relation Score Computation
Consider Equation 2, W t = g(q t ; θ g ), we design different implementations of g for different relation forms.

Label Form
In the label form, relations are represented with a fixed predicate set P. We first compute probabilities for these predicates in terms of q t , and then collect corresponding probabilities of r i,j as W t i,j .Formally, the predicate distribution is computed by p t = Softmax(MLP(q t )).
The Softmax function can be replaced with Sigmoid if predicates are not mutually exclusive, i.e., multiple predicates will be activated meanwhile.Let b denote the maximum number of relations between a pair of entity, then we can denote the relation as The predicate probabilities are collected in terms of the relation labels: We gather the probabilities by summing them up.max is another feasible option, but we find is more efficient and more stable.

Text Form
In the text form, relations are represented with natural language descriptions.The graph is built by extracting the co-occuring sentence of a pair of entity and replacing the entities with special placeholders.For example, the sentence Bill Gates and Melinda Gates have been married for 26 years contributes an edge from Bill Gates to Melinda Gates, whose relation text is <sub> and <obj> have been married for 26 years, as shown in Figure 2. We can get the reverse relations by exchanging the placeholders of subject and object, but for simplicity, we do not show them in the figure.
Let r i,j = {r i,j,1 , • • • , r i,j,b } and r i,j,k denotes the k-th relation sentence.We use a relation encoder to obtain the relation embeddings, and then compute the relation score by where means element-wise product, MLP maps the feature from d-dimensional to 1-dimensional.
Since there are a huge amount of (usually millions of) relation texts in a relation graph, it is impossible to compute the embeddings and scores for all of them.So in practice, we select a subset of relations at each step.Specifically, at step t, we select entities whose previous score a t−1 i is larger than a predefined threshold τ and only consider relations that start from these entities.Besides, if there are too many relations meeting this condition, we will only preserve top ω of them, sorting based on their subject entity score.By doing so, we just need to consider at most ω relations at each step.
We use the same method to process the mixed form, by simply regarding the label predicates as one-word sentences.

Datasets
MetaQA (Zhang et al., 2017) is a largescale dataset of multi-hop question answering over knowledge graph, which extends Wiki-Movies (Miller et al., 2016) from single-hop to multi-hop.It contains more than 400k questions, which are generated using dozens of templates and have up to 3 hops.Its knowledge graph is from the movie domain, including 43k entities, 9 predicates, and 135k triples.
Besides the label from, we also constructed the text form of MetaQA by extracting the text corpus of WikiMovies (Miller et al., 2016), which introduces the information of movies with free text.Following (Sun et al., 2019), we used exact match of surface forms for entity recognition and linking.Given an article of a movie, we took the movie as subject and the other relavant entities (e.g., mentioned actor, year, and etc) as objects.The sentence was processed with placeholders, that is, replacing the movie with <sub> (if it occurs) and the object entity with <obj>, and then regarded as the relation texts.An entity pair can have multiple textual relations.
WebQSP (Yih et al., 2016) has a smaller scale of questions but larger scale of knowledge graph.It contains thousands of natural language questions based on Freebase (Bollacker et al., 2008), which has millions of entities and triples.Its questions are either 1-hop or 2-hop.Following (Saxena et al., 2020), we pruned the knowledge base to contain only mentioned predicates and within 2-hop triples of mentioned entities.As a result, the processed knowledge graph includes 1.8 million entities, 572 predicates, and 5.7 million triples.We only consider the label form of WebQSP due to its huge scale.
CompWebQ (Talmor and Berant, 2018) is an extended version of WebQSP with more hops and constraints.Following (Sun et al., 2019), we retrieved a subgraph for each question using PageRank algorithm.On average, there are 1948 entities in each subgraph and the recall is 64%.

Baselines
KVMemNN (Miller et al., 2016) uses the keyvalue memory to store knowledge and conducts multi-hop reasoning by iteratively reading the memory.
VRN (Zhang et al., 2017) learns the reasoning path via reinforcement learning.Its intermediate results have a good interpretability.SRN (Qiu et al., 2020) improves VRN by beam search and reward shaping strategy, boosting its speed and performance.
GraftNet (Sun et al., 2018) extracts a questionspecific subgraph from the entire relation graph with heuristics, and then uses graph neural networks to infer the answer.
PullNet (Sun et al., 2019) improves GraftNet by learning to retrieve the subgraph with a graph CNN instead of heuristics.
ReifKB (Cohen et al., 2020) proposes a scalable implementation of probability transfer over largescale knowledge graph of label form.It can be regarded as a degenerated case of TransferNet.

Implementations
We added reversed relations into the relation graph, leading to double size of predicates and triples.For the text form, we exchanged the placeholder <sub> and <obj> as the reversed relation, e.g., <sub> co-founded the <obj> is converted to <obj> cofounded the <sub>.
For the experiments of MetaQA, we set the step number T = 3.We used bi-directional GRU (Chung et al., 2014) as the question encoder, and set the hidden dimension as 1024.The projecting function f t was a stack of linear layer and Tanh layer.The involved MLPs were implemented as simple linear layers.For the text form, we used another bi-directional GRU as the relation encoder.The threshold τ was set to 0.7 and ω was set to 400.Since the question hop is provided in MetaQA, we used the golden hop number as an auxiliary objective to help learn the hop distribution c.We computed the cross entropy loss and added it into Equation 7 after multiplying a factor of 0.01.The model was optimized using RAdam (Liu et al., 2020) with a learning rate 0.001 for 20 epochs, which took several hours for the label form and about one day for the text form on a single GPU of NVIDIA 1080Ti.
For the experiments of WebQSP and Comp-WebQ, we set the step number T = 2.We used a pretrained BERT (Devlin et al., 2018) as the question encoder and finetuned its parameters on our task.There is no hop annotations so we did not use the auxiliary loss.Other settings are the same as MetaQA.

Results on Text-Formed Graph
In Table 2  Besides the pure text form, we also compare the mixed form following (Sun et al., 2018(Sun et al., , 2019)).That is, randomly selecting 50% of the label-formed triples and add them into the text-formed relation graph.In this setting, we simply consider the predicates as sentences containing just one word, and use the relation encoder (see Sec.3.5.2) to process them.These 50% labels slightly improve the performance of TransferNet over the pure text form (about 0.4%), because some relations are missing in the text corpus.Compared with PullNet, Trans-ferNet is still in the lead by a large gap (85.2% v.s.94.7%).
Step 0 topic entity Step 1 Step 2 Step 3 answer entity who acted in the movies directed by the director of Some Mother's Son Table 4: Ablation study on MetaQA.We show the average hits@1 of different hops.

Ablation Study
Table 4 shows results of ablation study.We can see that the score truncation and language mask are both important, especially for the text form.As stated in Sec.3.4, the language mask is not needed in the label form.The auxiliary loss (see Sec. 4.3) slightly improves the performance because it helps the learning of hop attention.

Interpretability
We visualize the intermediate results of Transfer-Net for two 3-hop questions in Figure 3.The entities and relations whose score is larger than 0.8 are highlighted in red.The top question is aimed at the label-formed relation graph.The activated predicates for three hops are directed_by, directed_by_rev, and starred_actors respectively, where the suffix _rev means reverse relation.The bottom question is aimed at the text form.At step 1, TransferNet tries to find the screenwriter of the topic movie, and activates the relation whose tex-tual description is "based on the novel of the same name by <obj>".At step 2, the movie written by Harold Bell Wright is found.At step 3, we aim to find the movie's release year.But since the text descriptions of Western (which is the movie's genre) and 1926 are very similar, both of these two entities are activated.Here the proposed language mask successfully filters the wrong answers out. Figure 4 shows the average hits@1 on the label form of MetaQA when the models are trained with partial training examples (left) and at different epochs (right).We can see that TransferNet is very data-efficient and converges very fast.With only 10% training data, it still achieves the same performance as the entire training set.And it only needs two epochs to reach the optimal results.

Conclusions
We proposed TransferNet, an effective and transparent framework for multi-hop QA over knowledge graph or text-formed relation graph.It achieved 100% accuracy on 2-hop and 3-hop questions of label-formed MetaQA, nearly solving the dataset.On the more challenging WebQSP, CompWebQ and text-formed MetaQA, it also outperforms other state-of-the-art models significantly.Qualitative analysis shows the good interpretability of Trans-ferNet.

Figure 2 :
Figure 2: The framework of TransferNet (top) and example of reasoning process (bottom).

Table 1 :
Table 1 lists the statistics of these datasets.Dataset statistics.

Table 2 :
Hits@1 results of the label-formed datasets.TransferNet achieves 100% accuracy in the 2-hop and 3-hop questions of MetaQA.On WebQSP and CompWebQ it also outperforms baseline models by a large margin.

Table 2
ub > is a <o bj> Am er ica n W es ter n sil en t film dir ec ted by He nr y Ki ng 1926 the films that share screenwriters with The Shepherd of the Hills were released in which years Figure 3: Reasoning process of 3-hop questions.The top is in label form, where the suffix "_rev" means reverse relation.The bottom is in text form, where "mask" in blue means the language mask.We show the relation scores in purple and highlight the activated entities and relations (score > 0.8) and words (score > 0.05) in red. <s