Rescue Implicit and Long-tail Cases: Nearest Neighbor Relation Extraction

Relation extraction (RE) has achieved remarkable progress with the help of pre-trained language models. However, existing RE models are usually incapable of handling two situations: implicit expressions and long-tail relation types, caused by language complexity and data sparsity. In this paper, we introduce a simple enhancement of RE using k nearest neighbors (kNN-RE). kNN-RE allows the model to consult training relations at test time through a nearest-neighbor search and provides a simple yet effective means to tackle the two issues above. Additionally, we observe that kNN-RE serves as an effective way to leverage distant supervision (DS) data for RE. Experimental results show that the proposed kNN-RE achieves state-of-the-art performances on a variety of supervised RE datasets, i.e., ACE05, SciERC, and Wiki80, along with outperforming the best model to date on the i2b2 and Wiki80 datasets in the setting of allowing using DS. Our code and models are available at: https://github.com/YukinoWan/kNN-RE.


Introduction
Relation extraction (RE) aims to identify the relationship between entities mentioned in a sentence, and is beneficial to a variety of downstream tasks such as question answering and knowledge base population.Recent studies (Zhang et al., 2020;Zeng et al., 2020;Lin et al., 2020;Wang and Lu, 2020;Cheng et al., 2020;Zhong and Chen, 2021) in supervised RE take advantage of pre-trained language models (PLMs) and achieve SOTA performances by fine-tuning PLMs with a relation classifier.However, we observe that existing RE models are usually incapable of handling two RE-specific situations : implicit expressions and long-tail relation types.
Implicit expression refers to the situation where a relation is expressed as the underlying message Nearest Neighbor: He was the younger brother of Panagiotis and Athanasios Sekeris.
Test example: He is the youngest son of Liones, comparing with Samuel Liones and Henry Liones.Figure 1: Left: the retrieved example has a similar structure but with the phrase "younger brother", it becomes easier to infer.Right: Referring to the gold labels of nearest neighbors can reduce the bias.Highlighted words may directly influence on the relation prediction.
that is not explicitly stated or shown.For example, for the relation "sibling to", a common expression can be "He has a brother James", while an implicit expression could be "He is the youngest son of Liones, comparing with Samuel Liones and Henry Liones."In the latter case, the relation "sibling to" between "Samuel Liones" and "Henry Liones" is not directly expressed but could be inferred from them both are brothers of the same person.Such underlying message can easily confuse the relation classifier.Inspired by recent studies (Khandelwal et al., 2020;Guu et al., 2020;Meng et al., 2021)   through a nearest-neighbor search.As shown in Figure 1, for an implicit expression, the expression "son of" may mislead to an incorrect prediction while its retrieved nearest neighbor contains a direct expression "brother of", which is a more explicit expression of the gold label "sibling to".The prediction of long-tail examples, as shown in Figure 1, is usually biased toward the majority class.Nearest neighbor retrieval provides direct guidance to the prediction by referring to the labels of its nearest neighbors in the training set, and thus can significantly reduce the imbalanced classification.
Additionally, we observe that kNN-RE serves as an efficient way to leverage distant supervision (DS) data for RE.DS augments labeled RE datasets by matching knowledge base (KB) relation triplets and raw text entity pairs in a weak-supervision fashion (Mintz et al., 2009;Lin et al., 2016;Vashishth et al., 2018;Chen et al., 2021).Recent studies (Baldini Soares et al., 2019;Ormándi et al., 2021;Peng et al., 2020;Wan et al., 2022), which apply PLMs to the DS labeled data to improve supervised RE, require heavy computation due to the fact that they require pre-training on DS data, whose size is usually dozens of times that of supervised datasets.To address this issue, we propose a lightweight method to leverage DS data to benefit supervised RE by extending the construction of stored memory for kNN-RE to DS labeled data and outperforming the recent best pre-training method with no extra training.
In summary, we propose kNN-RE: a flexible kNN framework to solve the RE task.We conduct the experiments for kNN-RE with three dif-  provided.Denote the n-th hidden representation of the BERT encoder as h n .Assuming i and j are the indices of two beginning entity markers [H_PER] and [T_PER], we define the relation representation as x = h i ⊕ h j where ⊕ stands for concatenation.Subsequently, this representation is fed into a linear layer to generate the probability distribution p RE (y|x) for predicting the relation type.

Proposed Method: kNN-RE
Training Memory Construction For the i-th training example (x i , r i ), we construct the keyvalue pair (x i , r i ) where the key x i is the relation representation obtained from the vanilla RE model and the value r i denotes the labeled relation type.The memory (K, V) = {(x i , r i )|(x i , r i ) ∈ D} is thus the set of all key-value pairs constructed from all the labeled examples in the training set D.

DS Memory Construction
In this paper, with the awareness of the unique feature of RE to generate abundant labeled data by DS, we extend our method by leveraging DS examples for memory construction.Similar to training memory construction, we build key-value pairs for all the DS labeled examples with the vanilla RE model.Inference Given the test example x, the RE model outputs its relation representation x and generate the relation distribution p RE (y|x) between two mentioned entities.We then query the memory with x to retrieve its k nearest neighbors N according to a distance function d(., .)by L 2 distance with the KBF kernel.We weight retrieved examples by a softmax function on the negative distance and make an aggregation on the labeled relation types to predict a relation distribution p kN N (y|x): T (1) where T denotes a scaling temperature.Finally, we interpolate the RE model distribution p RE (y|x) and kNN distribution p kN N (y|x) to produce the final overall distribution: (2) where λ is a hyperparameter.

Experiment settings
Supervised Datasets We evaluate our proposed method on five popular RE datasets.Table 1 shows the statistics.ACE05 and TACRED datasets are built over an assortment of newswire and online text.Wiki80 (Han et al., 2019) is derived from Wikipedia crossing various domains.The i2b2   Besides, we also compare performances on the development set of two datasets as shown in Table 3, the experiment results emphasize the consistent improvement of our proposed methods.

Analysis
Case Study for Implicit Expressions We select two typical test examples to better illustrate the amendment by kNN retrieval as shown in Table 4.For the first example, the implicit relation between "maracanazo" and "1950 world cup" need to be inferred by other contextual information and the RE model makes an incorrect prediction as competition usually belongs to the object of the relation "participant of" as the second retrieved examples.However, the nearest example contains a simpler expression and rectifies the prediction.Refer to Appendix A for visualized analysis.
For the second example, the implicit expression leads to another confusing relation "subsidiary" while the nearest example captures the same structure that contains an "of" between two entities and makes the final prediction easier.

Figure 2 :
Figure 2: An illustration of kNN-RE.The memory is constructed with each pair of relation representations (Rep.) and relation labels from training set or DS set.For inference, the blue line denotes the workflow for vanilla RE and the black line denotes the workflow for kNN.
using kNN to retrieve diverse expressions for language generation tasks, we introduce a simple but effective kNN-RE framework to address abovementioned two problems.Specifically, we store the training examples as the memory by a vanilla RE model and consult the stored memory at test time

Table 1 :
Statistics of datasets.Rel.denotes relation types.

Table 2 :
Main Results of kNN-RE with different memory settings on five datasets.♣ denotes the methods using DS set.†: SOTA i2b2 2010VA adopts specific encoding."kNN only" means only using p kN N (y|x) and is described by λ = 1 in the parameters."Combined" means the combination of both memories by: αp kN N −RE (T rain) + (1 − α)p kN N −RE (DS), where p kN N −RE (T rain) and p kN N −RE (DS) is computed by Equation 2 corresponding to the "Train memory" and "DS memory.",and k, λ are given by the best setting of each single memory.Combined memory♣ 88.54 (α = 0.5) 78.25 (α = 0.6)

Table 3 :
Results on development set. .♣ denotes the methods using DS set.
1950 World Cup after beating ... in the final round match known as the "Maracanazo".Sidsel Ben Semmane with ...

Table 4 :
Two implicit test examples from Wiki80.

Table 5 :
Results on long-tail relation types.
Performance on Long-tail Relation Types We check the performance kNN-RE with training memory on several most long-tail relation types from the TACRED dataset and show in table5.Note that all long-tail relation types benefit from the effectiveness of kNN prediction except for "stateorprovince of birth", which contains only 8 test examples leading to an unconvincing performance.Ability in Low-Resource Scenario We also check the retrieval ability by varying the percentage of the training set to constraint the representation quality in the memory (Figure3).We can observe that with the decreasing number of the training examples, our kNN-RE (training) tends to achieve greater improvement even the training memory is also limited by the low resource.Surprisingly, our kNN-RE (DS) achieves the F1score of 74.31 (an improvement gap of 42.35 over PURE) with only 1% training examples provided, which indicates that the model can still retrieve accurate nearest neighbors from the DS memory.We believe this is due to the modern PLMs have learned robust representations during pre-training.We propose kNN-RE: a flexible kNN framework with different memory settings for solving implicit expression and long-tail relation issues in RE.The results show that our kNN-RE with training memory outperforms vanilla RE model and achieves SOTA F1 scores on three datasets.In the DS setup, kNN-RE also outperforms SOTA DS pre-training methods significantly without extra training.Limitations In this paper, we use kNN-based strategy in the inference stage to address the language complexity and data sparsity problem.It is more challenging for a model to learn the characteristics of these examples.While our approach is light-weighted and flexible, it cannot directly help the model to improve the classification of the implicit expression examples or long-tail relation examples types during the training stage.The representations of these examples remain coarse-grained.Incorporating the kNN manner strategies in the training stage by providing additional nearest neighbor references for the model could help the model learn better representations of the examples, which we leave as future work. Retrieval