Relation Classification with Entity Type Restriction

Relation classification aims to predict a relation between two entities in a sentence. The existing methods regard all relations as the candidate relations for the two entities in a sentence. These methods neglect the restrictions on candidate relations by entity types, which leads to some inappropriate relations being candidate relations. In this paper, we propose a novel paradigm, RElation Classification with ENtity Type restriction (RECENT), which exploits entity types to restrict candidate relations. Specially, the mutual restrictions of relations and entity types are formalized and introduced into relation classification. Besides, the proposed paradigm, RECENT, is model-agnostic. Based on two representative models GCN and SpanBERT respectively, RECENT_GCN and RECENT_SpanBERT are trained in RECENT. Experimental results on a standard dataset indicate that RECENT improves the performance of GCN and SpanBERT by 6.9 and 4.4 F1 points, respectively. Especially, RECENT_SpanBERT achieves a new state-of-the-art on TACRED.


Introduction
Relation classification, a supervised version of relation extraction, aims to predict a relation between two entities in a sentence. Relation classification is an important step to construct knowledge bases from a large number of unstructured texts (Trisedya et al., 2019), which benefits many natural language processing applications, such as natural language generation (Kang and Hashimoto, 2020) and question answering (Zhao et al., 2020).
Recently, the majority of methods make use of various neural network architectures to learn a fixed-size representation for a sentence and its * Corresponding author. 1 Our code is available at https://github.com/ /Saintfe/RECENT.  Figure 2: Entity type restriction for relation classification. According to entity type restriction, the number of candidate relations reduces from 5 (left) to 2 (right). entities with various language features, such as part of speech (POS), entity types, and dependency trees. Dependency trees that are parsed from sentences are exploited by GCN (Kipf and Welling, 2017) to model sentences (Zhang et al., 2018;Guo et al., 2019). As a sequence of words, a sentence is modeled by LSTM (Hochreiter and Schmidhuber, 1997) and its entity positions are involved with the attention mechanism (Zhang et al., 2017). More recently, pretrained language models (Devlin et al., 2019;Baldini Soares et al., 2019;Joshi et al., 2020) achieve good performance in relation classification since they are pretrained on massive corpora.
To recap, these methods utilize an encoder architecture (Badrinarayanan et al., 2017)   treat relations as labels 2 to be classified. However, in this process, these methods inevitably lose the semantics of relations. Take the mutual restrictions between a relation and entity types as an example.
In Figure 1, the relation who-is-born-when restricts its first entity to be a person and the second one to be a time. Conversely, entity types can also restrict candidate relations in relation classification. As illustrated in Figure 2, some inappropriate relations can be discarded from candidate relations by entity type restriction. However, the current methods neglect the restriction of entity types on relations so that some inappropriate relations are regarded as candidate relations, which further hurts their performance. To solve the above problem, a novel paradigm, RElation Classification with ENtity Type restriction (RECENT), is proposed to exploit entity types to restrict candidate relations. As the basis of the paradigm, the mutual restrictions of relations and entity types are formalized. With the entity type restriction, some inappropriate relations are discarded from the candidate relations of a specific pair of entity types, as illustrated in Figure 2. A specific classifier with a specific set of candidate relations is individually learned for each pair of entity types ( Figure 3). Therefore, the proposed paradigm, RECENT, can eliminate the interference from inappropriate candidate relations.
The contributions are summarized as follows: • The mutual restrictions of relations and entity types are formalized.
• A novel paradigm, RECENT, is proposed to 2 Specifically, these meaningful relations are treated as meaningless numbers, such as 0, 1, 2. exploit entity types to restrict candidate relations in relation classification.
• A new state-of-the-art is achieved on TA-CRED.

Proposed Paradigm
Before introducing the proposed paradigm RE-CENT, the mutual restrictions between a relation and a pair of entities are formalized as the basis of RECENT.

Relation Function
When a binary relation is considered as a function, this relation has two entities as its two arguments. Formally, this relation is formalized as r(s, o), where r denotes the relation and s, o denote the first (subject) entity and the second (object) entity, respectively. The range of this relation contains two discrete values {0, 1}: In a broad sense, the domain of this relation can be any pair of entities. However, when a pair of entities with inappropriate types is fed into a specific relation, the relation can directly return 0, no need to consider the compositional semantics of the relation and the pair of entities. For example, a specific relation who-is-born-when expects the first argument to be a person and the second one to be a time. Therefore, (apple, Steven Jobs) is a pair of inappropriate entities for this relation so that who-is-born-when(apple, Steven Jobs) returns 0 without considering the compositional semantics, since apple may refer to either a kind of fruit or a company (not a person) and Steven Jobs may refer to a famous person (not a time).
Only when a relation receives a pair of appropriate entities whose types match it, the combination of the relation and the entities might make sense (i.e., the function defined in Eq. 1 may return 1). In this case, it is meaningful to further verify the correctness of the compositional semantics. From this perspective, in a narrow sense, the domain (denoted by D r ) of a relation (r) is defined as follows: (2) where ts and to denote the types of the subject entity (s) and the object entity (o), respectively. S(r) and O(r) are the appropriate types of r on the subject entity (s) and the object entity (o), respectively.

Entity Type Restriction
In the previous subsection, the narrow domain of a relation restricts entities whose types need to match the relation. Conversely, given a pair of entities whose types are known, the candidate relations of the entities are also restricted, since the match between relations and entity types is mutual.
Formally, given a pair of entities (s, o) and their types (ts, to), its candidate relations (denoted by R (ts,to) ) are restricted into a limited set: where R denotes all possible relations. When the types (ts, to) of a pair of entities (s, o) are explicitly utilized to restrict its candidate relations, the candidate relations reduce from all possible relations R into a rather smaller set R (ts,to) .

Relation Classification
Unlike traditional methods that classify a sentence and its entities on all candidate relations R (the left part of Figure 3), the proposed paradigm, RECENT learns a specific classifier with smaller and more precise candidate relations for each pair of entity types (the right part of Figure 3), based on entity type restriction in the previous subsection.
The procedure of RECENT is summarized in Algorithm 1. In the learning phase, all sentences are first grouped by types of their entities (line 1). For each group (marked as g) with a specific pair of entity types (ts,to), the candidate relations R (ts,to) for the group g are obtained by aggregating the relations in the group g (line 3). Then, a specific classifier (marked by f g ) that maps sentences and their entities in g to R (ts,to) , is learned for the group g (line 4). In the prediction phase, given a new sample (se, s, o, ts, to), a group (marked as g ) is matched by the entity types (ts, to) (line 6). Then, the classifier f g learned on the group g is utilized to predict a relation according to the input (se, s, o) (line 7).
From the 4th line of Algorithm 1, the proposed paradigm RECENT is model-agnostic, which means that RECENT is theoretically compatible with many relation classification models.
Algorithm 1 RECENT Learning Phase: .., N } where the subscript i indicates the ith sample, se is sentence, s is subject entity, o is object entity, ts is type of subject entity, to is type of object entity, r is relation. Output: Multiple classifiers. 1: Group sentences by entity types. 2: for each group g (enity types (ts, to) ) do 3: aggregate relations in the group as candidate relations R (ts,to) defined in Eq. 3.

4:
learn a classifier (marked as f g ) on the group that maps {(se i , s i , o i ) ∈ g} to R (ts,to) . 5: end for Prediction Phase: Input: A new sample {se, s, o, ts, to}, each specific classifier for each pair of entity types. Output: A relation. 6: match the sample to a group (marked as g ) according to the entity types (ts, to). 7: Use the classifier (f g ) learned on the group to map (se, s, o) to a relation. 8: return the relation.

Dataset
The proposed paradigm RECENT is evaluated on TACRED 3 (Zhang et al., 2017). TACRED contains 41 semantic relations and a special no relation over 106,264 sentences. The subject entities in TA-CRED are classified into two types: PERSON and 3 https://catalog.ldc.upenn.edu/ LDC2018T24 ORGANIZATION while the object entities are categorized into 16 fine-grained types, such as LOCA-TION and TIME. Namely, entity types are known. By convention, the micro-averaged F1 score (abbreviated as F1) is reported on TACRED.

Experimental Setup
Since no relation is a candidate relation of each pair of entity types in TACRED, a binary classifier is first learned to distinguish between 41 semantic relations and no relation. In this way, each pair of entity types reduces one candidate relation (i.e. no relation) in RECENT. If the binary classifier predicts no relation for a pair of entities, then the final relation for them is no relation. Otherwise, their specific semantic relation is further predicted in RECENT.

Base Models
The proposed paradigm RECENT is model-agnostic. Two representative models that are GCN (Zhang et al., 2018) and SpanBERT (Joshi et al., 2020) are selected as base models (line 4 in Algorithm 1). For a fair comparison with a base model, all classifiers (including the binary classifier) in RECENT are trained by the base model. The corresponding models in the paper are denoted as RECENT GCN and RECENT SpanBERT .
Hyperparameters For RECENT GCN , the pathcentric pruning K is set to 1 as GCN (Zhang et al., 2018). The learning rates for all classifiers in RECENT GCN are set to 0.3. For RECENT SpanBERT , the learning rates for all classifiers are chosen from {5e-6, 1e-5, 2e-5, 3e-5, 5e-5} as SpanBERT.  (Wang et al., 2020a), and LUKE (Yamada et al., 2020). To save space, please refer to the original papers of these models for details.

Experimental Results
The experimental results are presented in Table 1. RECENT GCN achieves a significant performance increase on the F1 score above its base model GCN. The absolute increase reaches 6.9 from 64.0 to 70.9.  (Wang et al., 2020b) 71.5 72.5 72.0 K-Adapter †* (Wang et al., 2020a) 70.14 74.04 72.04 LUKE † (Yamada et al., 2020) 70.  The main contribution for the F1 increase is the improved precision that greatly increases from 69.8 to 88.3. The great increase in precision, which might result from the restriction on candidate relations by entity types in RECENT, indicates the effectiveness of the proposed paradigm RECENT. Besides, RECENT GCN suppresses the compared models that do not include pretrained language models. Similarly, RECENT SpanBERT overtakes its base model SpanBERT by absolute 4.4 points on F1. The great soar (absolute 20.1 points) on precision contributes the superior F1 of RECENT SpanBERT . Unfortunately, the decline in recall limits the further improvement of F1. This might be due to sample imbalance of candidate relations, which will be further studied in future work. On the whole (i.e. F1), RECENT SpanBERT outperforms all the compared models. Especially, RECENT SpanBERT exceeds the state-of-the-art LUKE model 4 by 2.5 F1 points and achieves a new state-of-the-art.  results in its reported paper (Zhang et al., 2018).

Error Analysis of GCN
Observing the prediction results of the model, we find that 1) 1

Conclusion
In the paper, a novel paradigm, RECENT, is proposed by entity type restriction. RECENT reduces candidate relations for each pair of entity types by the mutual restrictions between relations and entity types. RECENT is model-agnostic. RECENT GCN and RECENT SpanBERT that are based on two representative models GCN and SpanBERT respectively, outperform their counterparts on the standard dataset TACRED, which empirically indicates the effectiveness of the proposed paradigm RE-CENT. Especially, RECENT SpanBERT achieves a new state-of-the-art on TACRED.