Type-Aware Decomposed Framework for Few-Shot Named Entity Recognition

Despite the recent success achieved by several two-stage prototypical networks in few-shot named entity recognition (NER) task, the overdetected false spans at the span detection stage and the inaccurate and unstable prototypes at the type classification stage remain to be challenging problems. In this paper, we propose a novel Type-Aware Decomposed framework, namely TadNER, to solve these problems. We first present a type-aware span filtering strategy to filter out false spans by removing those semantically far away from type names. We then present a type-aware contrastive learning strategy to construct more accurate and stable prototypes by jointly exploiting support samples and type names as references. Extensive experiments on various benchmarks prove that our proposed TadNER framework yields a new state-of-the-art performance. Our code and data will be available at https://github.com/NLPWM-WHU/TadNER.


Introduction
Named entity recognition (NER) aims to detect entity spans and classify them into pre-defined categories (entity types).When there are sufficient labeled data, deep learning-based methods (Huang et al., 2015;Ma and Hovy, 2016;Lample et al., 2016;Chiu and Nichols, 2016) can get impressive performance.In real applications, it is desirable to recognize new categories which are unseen in training/source domain.However, collecting extra labeled data for these new types will be surely timeconsuming and labour-expensive.Consequently, few-shot NER (Fritzler et al., 2019;Yang and Katiyar, 2020), which involves identifying unseen entity types based on a few labeled samples for each class (i.e., support samples) in test domain, has attracted great research interests in recent years.
End-to-end metric learning based methods (Yang and Katiyar, 2020;Das et al., 2022) are the main- stream in few-shot NER.These methods need to simultaneously learn the complex structure consisting of entity boundary and entity type.When the domain gap is large, their performance will drop dramatically because it is extremely hard to capture such complicated structure information with only a few support examples for domain adaptation.This leads to the insufficient learning of boundary information, resulting that these methods often misclassify entity boundaries and cannot obtain very satisfying performance.
Recently, there is an emerging trend in adopting two-stage prototypical networks (Wang et al., 2022;Ma et al., 2022c) for few-shot NER, which decompose NER into two separate span extraction and type classification tasks and perform one task at each stage.Since decomposed methods only need to handle one single boundary detection task at the first stage, they can find more accurate boundaries and obtain better performance than end-to-end approaches.
While making good progress, these two-stage prototypical networks still face two challenging problems, i.e., the over-detected false spans and the inaccurate and unstable prototypes in corresponding stages.(1) At the span extraction stage in test phase, the decomposed approaches usually recall many over-detected false spans whose types only exist in the source domain.For example, "1976" in Figure 1 (a) belongs to a DATE type in the source domain since there are many samples like "Obama was born in 1961" in training, and thus it is easily recognized as a span by the span detector.However, there is no such label in the test domain and "1976" is thus assigned a false LOC type.(2) The prototypical networks in decomposed methods target at learning a type-agnostic metric similarity function to classify entities in test samples (i.e., query samples) via their distance to prototypes.Since the prototypes are constructed using very few support samples in the type-agnostic feature space, they might be inaccurate and unstable.For example, in Figure 1 (c), a prototype is just the support sample in one-shot NER and thus deviates far away from the real class center.
Based on the above observations, we propose a Type-Aware Decomposed framework, namely TadNER, for few-shot NER.Our method follows the span detection and type classification learning scheme in the decomposed framework but moves two steps further to overcome the aforementioned issues.
Firstly, we present a type-aware span filtering strategy to filter out false spans by removing those semantically far away from type names2 .By this means, the over-detected spans like "1976" whose types do not exist in test domain can be removed due to the long semantic distance to type names, as shown in Figure 1 (b).
Secondly, we present a type-aware contrastive learning strategy to construct more accurate and stable prototypes by jointly leveraging type names and support samples as references.Through this way, the type names can serve as the guidance for prototypes and make them not deviate too far away from the class centers even in some extreme outlier cases, as shown in Figure 1 (d).
Extensive experimental results on 5 benchmark datasets demonstrate the superiority of our TadNER over the state-of-the-art decomposed methods.In particular, in the hard intra Few-NERD and 1-shot Domain Transfer settings, TadNER achieves a 8% and 9% absolute F1 increase, respectively.

Method
In this section, we formally present our proposed TadNER.The overall structure of our TadNER is shown in Figure 2. Note that the type-aware contrastive learning and type-aware span filtering strategies take effect at the type classification stage in the training and test domain, respectively.
Task Formulation Given a sequence X = {x 1 , x 2 , ..., x N } with N tokens, NER aims to assign each token x i a corresponding label y i ∈ T ∪ {O}, where T is the entity type set and O denotes the non-entity label.For few-shot NER, a model is trained in a source domain dataset D source with the entity type set

Source Domain Training
The source domain training consists of span detection and type classification stages.The procedure is shown in Figure 2 (a).

Span Detection
The span detection stage is formulated as a sequence labeling task, similar to an existing decomposed NER model (Ma et al., 2022c).We adopt BERT (Devlin et al., 2019) with parameters θ 1 as the PLM encoder f θ 1 .Given an input sentence X = {x 1 , x 2 , ..., x N }, the encoder produces contextualized representations for each token as: where H ∈ R N * r3 .H is then fed into a classification layer consisting of a dropout layer (Srivastava et al., 2014) where W ∈ R |C| * r and b ∈ R |C| are the weight matrix and bias.
After that, the training loss is formulated by the averaged cross-entropy of the probability distribution and the ground-truth labels: where y i =0 when the i-th token is O-token, y i =1 otherwise.Specifically, we denote the training loss of span detection stage as L span .During the training procedure, the parameters {θ 1 , W, b} are updated to minimize L span .

Type Classification
Representation Given an input sentence X, we only select entity-tokens E = {e 1 , e 2 , ..., e M } (E ⊂ X) with ground-truth labels Y = {y 1 , y 2 , ..., y M } for the training of this stage.For the entity type set T source = {t 1 , t 2 , ..., t m } of the source domain D source , we manually convert them into their corresponding type names After that, to obtain tokens with type name information, which are further used for calculating contrastive loss, we concatenate entity tokens with their corresponding labels in two orders, i.e., entitylabel order and label-entity order.Here we use another encoder f θ 2 with parameters θ 2 to obtain contextual representations: where ⊕ is the concatenation operator, and h el i and h le i denote two kinds of type-aware representations of the entity-token e i , which are obtained in entitylabel order and label-entity order, respectively.
Type-Aware Contrastive Learning To learn a generalized and type-aware feature space, which can further be used for constructing more accurate and stable prototypes, we borrow the idea of contrastive learning (Khosla et al., 2020) and use two kinds of type-aware token representations mentioned above to construct positive and negative pairs as shown in Figure 2 (a), i.e., those with the same label in different orders as positive pairs and those with different labels as negative pairs.The type-aware contrastive loss is calculated as: where M is the number of entity tokens in a batch and Z i is the set of positive samples with the same label type y i .Here we adopt the dot product with a normalization factor as the similarity function sim().We also add a temperature hyper-parameter τ for focusing more on difficult pairs (Chen et al., 2020).During the source domain training, the parameters θ 2 are updated to minimize L type .

Target Domain Inference
As illustrated in Figure 2 (b), during the target domain inference, we first extract candidate spans in query sentences and then remove over-detected false spans via the type-aware span filtering strategy.Finally, we classify remaining candidate spans into certain entity types to get the final results.

Span Detection
The span detector with its parameters {θ 1 , W, b} trained in the source domain is further fine-tuned with samples in the support set S target in the target domain to minimize L span in Eq.(3).To alleviate the risk of over-fitting, we adopt a loss-based early stopping strategy, i.e., stopping the fine-tuning procedure once the loss rises β times continuously, where β is a hyper-parameter.
After fine-tuning the span detector, we use it to detect entity words of query sentences in Q target and then consider continuous entity words as a candidate span, e.g., "Barack Obama".Finally, we obtain the candidate span set C span containing all candidate spans, which will be assigned entity types at the type classification stage.

Type Classification Domain Adaption
Benefiting from the generalized and type-aware feature space trained in the source domain, we can further get a domainspecific encoder f θ ′ 2 via fine-tuning with the following loss: Type-Aware Span Filtering As we illustrate in the introduction, the span detector may generate some over-detected false spans whose type names only belong the source domain, since the semantics of entity type names are not considered at the span detection stage.To solve this problem, we propose a type-aware span filtering strategy during the inference phase to remove these false spans.Intuitively, the semantic distance of these false spans is far from all the golden type names.Based on this assumption, we calculate a threshold γ t with the fine-tuned encoder f θ ′ 2 using entity tokens and corresponding type names in the support set: This threshold γ t is used to remove the overdetected false spans.And the remaining candidate spans will be assigned corresponding labels.
Type-Aware Prototype Construction We can construct a type-aware prototype for each entity type t j ∈ T target , which is more accurate and stable owing to the generalized and type-aware feature space learned in the source domain: where ⊕ is the concatenation operator and Z j denotes the set of entity words with the label type t j in the support set.
Inference For each remaining candidate span s i , we assign it a label type t j ∈ T target with the highest similarity: where p j is the type-aware prototype representation corresponding to the label type t j , and y pred is the predicted label type of the candidate span s i .h i is the self-concatenated representation of s i for consistency with the dimension of the prototype p j .The entire procedure of inference in the target domain is presented in Appendix A.1.
Baselines We compare our proposed TadNER with many strong baselines, including one-stage and two-stage types.

Main Results
Table 1 and 2 report the comparison results between our method and baselines under Few-6 Please refer to Appendix A.2-A.5 for more descriptions about datasets, evaluation methods, baselines and implementation details.
NERD7 and Domain Transfer, respectively.We have the following important observations: 1) Our model demonstrates superiority under Few-NERD settings.Notably, in the more challenging intra task, our TadNER achieves an average 8.2% increase in F1 score.Besides, our model outperforms baselines by 10.5% and 9.2% under 1-shot and 5-shot Domain Transfer settings, respectively.2) Particularly, when provided with very few samples (e.g., 1-shot), the improvements become even more significant, which is a very attractive property.
3) The performance of DecomposedMetaNER, a competing model, severely deteriorates under certain settings, such as I2B2.This is primarily due to the presence of numerous sentences without entities, leading to multiple false detected spans.In contrast, our TadNER effectively mitigates this issue through the type-aware span filtering strategy, successfully removing false spans and achieving promising results.

Ablation Study
To validate the effectiveness of the main components in TadNER, we introduce the following variant baselines for the ablation study: 1) TadNER w/o Type-Aware Span Filtering (TASF) removes the type-aware span filtering strategy and directly feeds all spans detected at span detection stage to type classification.2) TadNER w/o Type Names (TN) further replaces type names with random vectors when calculating contrastive loss and constructs class prototypes using only the support samples.
3) TadNER w/o Span-Finetune skips the target domain adaptation of the span detection stage.4) Tad-NER w/o Type-Finetune skips the target domain adaptation of the type classification stage.From Table 3, we can observe that: 1) The removal of the type-aware span filtering strategy leads to a drop in performance across most cases, particularly in entity-sparse datasets like I2B2, where a large number of false positive spans are detected.Besides, for entity-dense datasets like GUM, the performance is not harmed by the span filtering strategy, which proves the robustness and effectiveness of our model in various real-world applications.2) The omission of type names also results in a significant decrease in performance, indicating that our model indeed learns a type-aware feature space, which plays a crucial role in fewshot scenarios.3) The elimination of finetuning in the span detection and type classification stages exhibits a substantial performance drop.This demon-strates that the training objective in the source domain training phase aligns well with the target domain finetuning phase via task decomposition and contrastive learning strategy, despite having different entity classes.As a result, the model can effectively utilize the provided support samples from the target domain, enhancing its performance in few-shot scenarios.

Case Study
To examine how our model accurately constructs prototypes and filters out over-detected false spans with the help of type names, we randomly select one query sentence from Few-NERD intra and CoNLL2003 for case study.We compare TadNER with DecomposedMetaNER (Ma et al., 2022c), which also belongs to the two-stage methods.
As shown in Figure 3, in the first case, our model correctly predicts "turkish tff third league" as "organization-sportsleague" type, while DecomposedMetaNER identifies it as a wrong "organization-sportsteam" type.Since the type name and the entity span have an overlapping word "league", incorporating the type name into the construction of the prototype will make the identification much easier.Conversely, without the type name, it would be hard to distinguish two categories of entities because they both represent "sports-related organizations".
In the second case, DecomposedMetaNER incorrectly identifies "two" as an entity span and then assigns it a wrong entity type "LOC", since there are many samples like "The two sides had not met since Oct. 18" in the source domain Ontonotes, where "two" is an entity of "CARDINAL" type.In contrast, our TadNER successfully removes this false span via the type-aware span filtering strategy.

Impact of Type Names
To further explore the impact of incorporating the semantics of type names and whether model perfor-mance is sensitive to these converted type names.We perform experiments with the following variants of type names: 1) Original type names, which are used in our main comparision experiments.2) Synonymous type names.We generate three synonyms for each original type name as variants using ChatGPT.These synonyms were automatically generated to explore the effect of different but related type names on model performance.3) Meaningless type names, e.g., "label 1" and "label 2".4) Misleading type names, e.g., "person" for "LOC" and "location' for "PER" in the CoNLL dataset.Please refer to Appendix A.7 for details.As shown from the Figure 4, we can make the following observations: 1) All three variants of synonym type names have comparable performance, indicating that our method is robust to different ways of transforming type names.However, the best way is still the direct transformation method, such as "person" for "PER", which is how we obtain the original type names.2) Irrelevant or incorrect information in meaningless and misleading type names leads to a significant degradation in model performance, indicating that the semantics associated with entity classes are more suitable as anchor points for contrastive learning.

Impact of Type-Aware Prototypes
In order to investigate the effectiveness of our proposed strategy for solving the problem of inaccurate and unstable prototypes in the type classification stage, we further perform an analysis of the impact of stability and quality of prototypes.We select three baselines as our compared methods: 1) TadNER w/o Type Names (TN) (the second variant baseline in the ablation study).2) Decom-posedMetaNER (Ma et al., 2022c).3) Vanilla Contrastive Learning (CL), which adopts token-token contrastive loss and was proposed by Das et al. (2022).We use it to train the type classification module in a decomposed NER framework, in order to explore whether it can address the issue of unstable and inaccurate prototypes.Here we adopt the same 10 samplings used in the 1-shot Domain Transfer experiments.As shown in Figure 5, our proposed TadNER achieves a significant improvement over Decom-posedMetaNER on each dataset and is more stable across different samplings.Besides, removing type names causes a sharp performance drop in some cases for TadNER w/o TN, indicating that the incorporation of type names indeed helps construct more stable and accurate prototypes.Moreover, Vanilla CL performs extremely poorly due to the introduction of an additional projection layer, which is a crucial component employed in various contrastive learning methods (Chen et al., 2020;Das et al., 2022).However, the inclusion of this layer hampers the model's capacity to acquire adequate semantics related to entity classification.

Error Analysis
We conduct an error analysis to examine the detailed types of errors made by different models.The error statistics are shown in Table 4.
We can observe that: 1) Our TadNER makes fewer errors than baselines overall.Notably, it significantly reduces false negatives, indicating its ability to accurately recall more correct entities.2) Both TadNER and FSLS can effectively reduce "Type" errors by incorporating type names.However, though FSLS has less "Type" errors than our TadNER, it produces a much larger number of un- Table 4: Error analysis for different methods under the Few-NERD Intra 5-way 1∼2-shot setting.We select the first 300 episodes for analysis."False Positive" and "False Negative" denote the incorrectly extracted entities and unrecalled entities, respectively."Span" and "Type" denote the error is due to incorrect span/type.
recalled samples, i.e., false negatives.3) Our Tad-NER still suffers from inaccurate span prediction, which inspires our future work.

Model Efficiency
Compared to one-stage approaches, e.g., CON-TaiNER, two-stage models require more parameters, longer training and inference times.To have a close look at the time cost induced by two-stage models, we perform a model efficiency analysis and show the results in Table 5.Table 5: Model efficiency analysis for different methods under the Few-NERD Intra 5-way 1∼2-shot setting.
From Table 5, it can be seen that two-stage models indeed require longer training and inference time than one-stage models.However, two-stage models often get better performance.In particular, our TadNER is the most effective one among both one-stage and two-stage models, and it achieves a F1 improvement of 45% and 67% over CON-TaiNER and ESD.It is also the most efficient one among three two-stage models in terms of the inference time.

Zero-Shot Performance
Since there is no domain-specific support set under zero-shot NER settings, it is extremely challenging and rarely explored.While we believe our proposed TadNER can obtain certain zero-shot ability after training in the source domain for the following two reasons: 1) the model can extract entity spans in the span detection stage before fine-tuning with support samples, 2) since the feature space learnt in the type classification stage is well generalized and type-aware, we can directly adopt the representations of type names as prototypes of novel entity types.To demonstrate the promising performance of our model under zero-shot settings, we select SpanNER (Wang et al., 2021) as a strong baseline, which is a decomposed-based method and good at solving zero-shot NER problem.As shown in Table 6, our proposed TadNER performs better than SpanNER (Wang et al., 2021) under every case.The reason for this may be that the type classification of SpanNER is based on a traditional supervised classification model, which performs worse generalization in cross-domain scenarios.Besides, compared with previous metricbased methods (Das et al., 2022;Ma et al., 2022c) for few-shot NER, which heavily rely on support sets and had no zero-shot capability, our method is more inspirational for future zero-shot NER works.

Related Work
Few-Shot NER Few-shot NER methods can be categorized into two types: prompt-based and metric-based.Prompt-based methods focus on leveraging pre-trained language model knowledge for NER through prompt learning (Cui et al., 2021;Ma et al., 2022b;Huang et al., 2022;Lee et al., 2022).They rely on templates, prompts, or good examples to utilize the pre-trained knowledge effectively.Metric-based methods aim to learn a feature space with good generalizability and classify test samples using nearest class prototypes (Snell et al., 2017;Fritzler et al., 2019;Ji et al., 2022;Ma et al., 2022c) or neighbor samples (Yang and Katiyar, 2020;Das et al., 2022).
There are also some efforts to improve few-shot NER by incorporating type name (label) semantics (Hou et al., 2020;Ma et al., 2022a).These methods usually treat labels as class representatives and align tokens with them, yet neglecting the joint training of entity words and label representations.Hence they can only use either support sets or labels as class references.Instead, our method exploits support samples and type names simultaneously, which helps construct more accurate and stable prototypes in the target domain.
Task Decomposition and Contrastive Learning Recently, decomposed-based methods have emerged as effective solutions for the NER problem (Shen et al., 2021;Wang et al., 2021;Zhang et al., 2022;Wang et al., 2022;Ma et al., 2022c).These methods can learn entity boundary information well in data-limited scenarios and often get better results.However, the widely used prototypical networks in these methods may encounter inaccurate and unstable prototypes given limited support samples at the type classification stage.Besides, they may face the problem of over-detected false spans produced at the span detection stage.Our method can address these two issues via the proposed type-aware contrastive learning and typeaware span filtering strategies.
Our method is also inspired by contrastive learning (Chen et al., 2020;Khosla et al., 2020).Due to its good generalization performance, two recent methods (Das et al., 2022;Huang et al., 2022) borrow this idea for few-shot NER, which construct contrastive loss between tokens or between the token and the prompt.However, they are both the end-to-end approach and thus have the inherent drawback that cannot learn good entity boundary information.In contrast, our method is a decomposed one and our contrastive loss is constructed between tokens with additional type name information, which can find accurate boundary and learn a type-aware feature space.

Conclusion
In this paper, we propose a novel TadNER framework for few-shot NER, which handles the span detection and type classification sub-tasks at two stages.For type classification, we present a typeaware contrastive learning strategy to learn a typeaware and generalized feature space, enabling the model to construct more accurate and stable prototypes with the help of type names.Based on it, we introduce a type-aware span filtering strategy for removing over-detected false spans produced at the span detection stage.Extensive experiments demonstrate that our method achieves superior performance over previous SOTA methods, especially in the challenging scenarios.In the future, we will try to extend TadNER to other NLP tasks.

Limitations
Our proposed TadNER mainly focuses on the type classification stage of few-shot NER and simply adopt token classification for detecting entity spans.There might be better solutions, e.g., using global boundary matrix.However, due to its high GPU memory requirements, we do not include it in our current framework.This drives us to find more efficient and powerful span detector for better fewshot NER performance in the future.

Ethics Statement
Our work is entirely at the methodological level and therefore there will not be any negative social impacts.In addition, since the performance of the model is not yet at a practical level, it cannot be applied in certain high-risk scenarios (such as the I2B2 dataset used in our paper) yet, leaving room for further improvements in the future.max_sim = max

A Appendix
A.1 Target Domain Inference Algorithm Algorithm 1 describes the process of domain adaptation using support set in the target domain and prediction on the query set.Lines 1-7 describe the target domain adaptation process for the span detection stage.Lines 8-14 describe the target domain adaptation process for the type classification stage.Lines 15-19 describe the extraction of candidate entity spans in the query set using the fine-tuned span detector.Lines 20-31 describe the candidate entity span filtering and entity type classification using type-aware prototypes.

Figure 1 :
Figure 1: (a) shows over-detected false spans, (b) shows spans got by adopting our type-aware span filtering strategy.(c) shows inaccurate and unstable prototypes, (d) shows prototypes got by adopting our type-aware contrastive learning strategy.

Figure 2 :
Figure 2: The overall structure of our proposed TadNER framework.(a) Training in the source domain.(b) Inference on the query set by utilizing the support samples in the target domain.Note that the source and target domains have different entity type sets.

C2:Figure 3 :
Figure 3: Case study.C1 and C2 are from Few-NERD intra and CoNLL2003 in Cross datasets, respectively, and organization-sportsteam, organization-sportsleague, ORG and LOC are entity types.

Figure 4 :
Figure 4: F1 Scores on Few-NERD Intra and CoNLL 2003 with different variants of type names.

Figure 5 :
Figure 5: Impacts of prototypes by different methods under 1-shot Domain Transfer setting.The horizontal and vertical coordinates indicate the n-th sampling and the accuracy of type classification, respectively.
The model is then fine-tuned in a test/target domain dataset D target with the entity type set T target = {t 1 , t 2 , ...t n } using a given support set S target .The entity token set and corresponding label set in S target are denoted as E s = {e s 1 , e s 2 , ..., e s M } and Y s = {y s 1 , y s 2 , ..., y s M }, where y s i ∈ T target is the label and M is the number of entity tokens.The model is supposed to recognize entities in the query set Q target of the target domain.Besides, T source and T target have no or very little overlap, making few-shot NER very challenging.More specifically, in the n-way k-shot setting, there are n labels in T target and k examples associated with each label in the support set S target .

Table 1 :
Ma et al. (2022c)andard deviations for Few-NERD.†denotesthe results reported byMa et al. (2022c).* denotes the results reported by our replication using data of the same version.The best results are in bold and the second best ones are underlined.
Das et al. (2022)he train/dev/test sets are divided according to the coarse-grained and fine-grained types, respectively.Besides, followingDas et al. (2022), we also conduct Domain Transfer experiments, where data are from different text domains

Table 2 :
F1 scores with standard deviations for Domain Transfer.
Das et al. (2022)ults reported byDas et al. (2022).* denotes the results reported by our replication.Since no previous two-stage methods have conducted experiments under this setting, we choose the strong DecomposedMetaNER for reproduction experiments, and * denotes the results reported by our replication.The best results are in bold and the second best ones are underlined.

Table 3 :
Results (F1 scores) for ablation study under Domain Transfer settings.The best results are in bold.

Table 6 :
F1 scores under Domain Transfer zero-shot settings.

Table 7 :
(Yang and Katiyar, 2020)ERD dataset.Each episode consists of a support set and a query set, both given in the n-way k-shot form.In each episode, the model trained in the source domain is tested on the query set by utilizing the support set.To make fair comparisons, we obtain the Micro F1 score with the episode-data processed byDing et al. (2021).We report the mean F1 score with standard deviation using 3 different seeds.Dataset-level EvaluationYang and Katiyar (2020) point that sampling test episodes may not reflect the real-world performance due to various data distributions, and they propose to sample support sets and then test the model in the original test set.Each support set consists of k examples corresponding to each label.The final Micro F1 scores and standard deviations are obtained using different sampled support sets.Thus, following Yang and Katiyar (2020) and Das et al. (2022), we also adopt this evaluation schema for Domain Transfer settings.For fair comparisons, we use the support sets sampled by Das et al. (2022) 8 .A.4 BaselinesProtoBERT(Fritzler et al., 2019)adopts a token-level prototypical network, where the prototype of each class is obtained by averaging token samples of the same label, and the label of each unlabeled token in the query set is determined by its nearest class prototype.NNShot(Yang and Katiyar, 2020)pre-trains BERT by traditional classification methods in the source domain training phase, and decides the class of each unlabeled token by the nearest neighbor at the token level in the target domain inference phase.StructShot(Yang and Katiyar, 2020)is based on NNshot and uses an abstract transition probability for Viterbi decoding during testing.

Table 11 :
Original labels and their corresponding natural-language-form type names of Few-NERD.

Table 12 :
Original labels and their corresponding natural-language-form type names of datasets under Domain Transfer settings.