GDA: Generative Data Augmentation Techniques for Relation Extraction Tasks

Relation extraction (RE) tasks show promising performance in extracting relations from two entities mentioned in sentences, given sufficient annotations available during training. Such annotations would be labor-intensive to obtain in practice. Existing work adopts data augmentation techniques to generate pseudo-annotated sentences beyond limited annotations. These techniques neither preserve the semantic consistency of the original sentences when rule-based augmentations are adopted, nor preserve the syntax structure of sentences when expressing relations using seq2seq models, resulting in less diverse augmentations. In this work, we propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures. We adopt a generative formulation and design a multi-tasking solution to achieve synergies. Furthermore, GDA adopts entity hints as the prior knowledge of the generative model to augment diverse sentences. Experimental results in three datasets under a low-resource setting showed that GDA could bring {\em 2.0\%} F1 improvements compared with no augmentation technique. Source code and data are available.


Introduction
Relation Extraction (RE) aims to extract semantic relations between two entities mentioned in sentences and transform massive corpora into triplets in the form of (subject, relation, object).Neural relation extraction models show promising performance when high-quality annotated data is available (Zeng et al., 2017;Zhang et al., 2017;Peng et al., 2020).While in practice, human annotations would be labor-intensive and time-consuming to obtain and hard to scale up to a large number of relations (Hu et al., 2020(Hu et al., , 2021a,b;,b;Liu et al., 2022b).This motivates us to solicit data augmentation techniques to generate pseudo annotations.
A classical effort devoted to data augmentation in NLP is adopting rule-based techniques, such as synonym replacement (Zhang et al., 2015;Cai et al., 2020), random deletion (Kobayashi, 2018;Wei and Zou, 2019), random swap (Min et al., 2020) and dependency tree morphing ( Şahin and Steedman, 2018).However, these methods generate synthetic sentences without considering their semantic consistencies with the original sentence, and may twist semantics due to the neglection of syntactic structures.Other successful attempts on keeping the semantic consistency of the sentences are modelbased techniques.The popular back translation method (Dong et al., 2017;Yu et al., 2018) generates synthetic parallel sentences using a translation model to translate monolingual sentences from the target language to the source language.However, it works exclusively on sentence-level tasks like text classification and translation, which is not designed to handle fine-grained semantics in entity-level tasks like relation extraction.Bayer et al. (2022) design a specific method for RE tasks by fine-tuning GPT-2 to generate sentences for specific relation types.However, it cannot be used in practice because the model generates less diverse sentences -it includes similar entities and identical relational expressions under the same relation.
To keep the generated sentences diverse while semantically consistent with original sentences, we propose a relational text augmentation technique named GDA.As illustrated in Figure 1, we adopt the multi-task learning framework with one shared encoder and two decoders that are complementary with each other: One decoder aims to predict the original sentence by restructuring words in the syntactic structure, which can maintain the semantics of the original sentence and ensure the model has the ability to generate semantically consistent target sentence.However, restructuring the syntactic restructured original sentence y 1 y 2 y T y

DecoderθR
A surgeon carefully applies the splints to the forearm. winemaker

DecoderθP
The winemaker carefully chose grapes from lots.
A surgeon carefully applies to the forearm the splints.

Entity Hint
x 1 x 2 x T x

Training
Inference

DecoderθP
Entity Hint The train has as many as six sets of doors.
The kitchen has sets of stove, and a microwave.We highlight the entities and pattern in the sentences.We define the pattern as the dependency parsing path between two entities.
structure of the original sentence inevitably breaks the coherence.Therefore, another decoder preserves and approximates the syntax patterns of the original sentence by generating the target sentence with a similar syntax structure drawn from the existing data.This decoder can not only keep the target sentences coherent but more importantly, ensure that the model could maintain the original syntax pattern when generating pseudo sentences.Therefore, different patterns under the same relation can be preserved, instead of predicting the same syntax pattern due to relational inductive biases (Sun et al., 2021), thereby increasing the diversity of augmented sentences.We further adopt an entity in the target sentence as a hint to the input of that decoder, which can serve as prior knowledge to control the content of generated sentences.During inference, we could generate diverse sentences by taking a variety of different entity hints and origin sentences with various syntax patterns as input.To summarize, the main contributions of this work are as follows: • We study the task that focuses on the synergy between syntax and semantic preserving during data augmentation and propose a rela-tional text augmentation technique GDA.(Chen et al., 2020b), Ex2 (Lee et al., 2021), and BackMix (Jin et al., 2022) aim to interpolate the embeddings and labels of two or more sentences.Guo et al. (2020) proposes SeqMix for sequence translation tasks in two forms: the hard selection method picks one of the two sequences at each binary mask position, while the soft selection method softly interpolates candidate sequences with a coefficient.Soft selection method also connects to existing techniques such as SwitchOut (Wang et al., 2018) and word dropout (Sennrich et al., 2016).
Model-Based Techniques Model-based techniques such as back translation (Sennrich et al., 2016), which could be used to train a question answering model (Yu et al., 2018) or transfer sequences from a high-resource language to a lowresource language (Xia et al., 2019).improve the dialogue language understanding task.Kobayashi (2018); Gao et al. (2019) propose the contextualized word replacement method to augment sentences.Anaby-Tavor et al. (2020); Li et al. (2022a); Bayer et al. (2022) adopt language model which is conditioned on sentence-level tags to modify original sentences exclusively for classification tasks.Some techniques try to combine some simple data augmentation methods (Ratner et al., 2017;Ren et al., 2021) or add human-in-the-loop (Kaushik et al., 2019(Kaushik et al., , 2020)).Other paraphrasing techniques (Kumar et al., 2020;Huang and Chang, 2021;Gangal et al., 2021) and rationale thinking methods (Hu et al., 2023) also show the effectiveness of data augmentation methods.

Characteristics Comparsion
We compare our GDA with other data augmentation techniques from the characteristics of semantic consistency, coherence, and diversity in Table 1.Note that the example interpolation techniques do not generate specific sentences, and only operates at the semantic embedding level.Therefore, we believe that they can only maintain semantic consistency.Compared with other SOTA data augmentation techniques, GDA uses a multi-task learning framework, which leverages two complementary seq2seq models to make the augmented sentences have semantic consistency, coherence, and diversity simultaneously.

Proposed data augmentation technique
The proposed data augmentation technique GDA consists of two steps: 1) Train a seq2seq generator.2) Generate pseudo sentences.As illustrated in Figure 1, the first step adopts T5 (Raffel et al., 2020) consisting of encoder and decoder parts as the seq2seq generator (θ).The generator learns to convert two sentences with the same re-lation label.Specifically, the encoder part takes a sentence X = (x 1 , x 2 , ..., x Tx ) as input where named entities are recognized and marked in advance, and obtains the contextualized token embeddings H = (h 1 , h 2 , ..., h Tx ).The decoder part takes the H as input and generates target sentence Y = (y 1 , y 2 , ..., y Ty ) word by word by maximizing the conditional probability distribution of p(y i |y <i , H, θ).The second step randomly selects an annotated sentence as input, and leverages the trained generator to generate pseudo sentence with entity marker and same relation label.Now, we introduce the details of each step.

Train a seq2seq generator
Training a seq2seq generator aims to obtain a generator that could augment annotated sentences to diverse, semantically consistent, and coherent pseudo sentences.In addition, the entities in the augmented pseudo sentences also need to be marked for entity-level relation extraction task.To achieve this goal, the generator must convert two sentences with the same relation label and emphasize contextualized relational signals at the entity level during the generation process.In practice, for each annotated sentence X = (x 1 , x 2 , ..., x Tx ), we adopt the labeling scheme introduced in Soares et al. ( 2019), and augment X with four reserved tokens: [/E obj ] to represent the start and end position of subject and object named entities respectively, and inject them to X.For example, "A [E sub ] surgeon [/E sub ] carefully applies the [E obj ] splints [/E obj ] to the forearm.".Then we feed the updated X into the T5 encoder part to obtain contextualized token embeddings H: H = Encoder(X).
A natural paradigm for the decoder part to generate the target sentence is to select another sentence in the training set that has the same relation as the input sentence.Bayer et al. (2022) fine-tuned GPT-2 to generate sentences for specific relation types.However, it requires too much computational cost to train multiple GPT-2s for each relation type, and we observed no promising results are obtained.We attribute the reason to two aspects: 1) the diversity of generated sentences is not emphasized, resulting in the generation of sentences with similar patterns, and 2) the entity-level relation extraction task is not considered, resulting in missing entity information.
In this paper, we propose to leverage the multitask learning framework to address the above short-comings, which performs two tasks: original sentence restructuring and original sentence pattern approximation.In practice, our framework consists of two seq2seq models that share the same encoder part, but employ two decoder parts to complete the two tasks, respectively.
Original sentence restructuring.The original sentence restructuring task aims to improve the ability of the model to generate semantically consistent sentences.As illustrated in Figure 1, the target generated sentence is just the restructured original sentence X Tx ) that has the same length and words as the original sentence.We adopt the pre-ordering rules proposed by Wang et al. (2007) in machine translation.These rules could modify the syntactic parse tree obtained from the original sentence and permutate the words by modifying the parsed tree.The target sentence is closer to the expression order of words without changing the semantics of the original sentence Furthermore, since the entities are not changed, it is easy to mark the position of the entities, e.In this way, the decoder network is employed to predict the restructured original sentence by maximizing p(X ′ |H, θ R ): where θ R denotes the parameters of the decoder part for original sentence restructuring.M is the number of the training data.
Original sentence pattern approximation.The restructured sentence inevitably breaks the coherence of the original sentence.Therefore, we employ another seq2seq model to auxiliary predict unmodified sentences.When we directly adopt seq2seq model for sentence generation (Bayer et al., 2022), The seq2seq model usually generates stereotyped sentences with the same pattern (Battaglia et al., 2018).For example, for relation Component-Whole, using generative methods such as T5 will always tend to generate pattern "consist of " with high frequency, rather than rare "is opened by" and "has sets of ", which will limit the performance improvement of augmented sentences.
In this paper, we introduce the original sentence pattern approximation target sentence

Decoder
A surgeon carefully applies the splints to the forearm. winemaker

Decoder
The winemaker carefully chose grapes from lots.
rgeon carefully applies to he forearm the splints.

The winemaker carefully chose grapes from lots
A surgeon carefully applies the splints to the forearm pattern approximation task to force the original and target sentences to have approximate patterns, so that the augmented sentences could maintain the pattern of the original sentences and enhance the sentence diversity.In practice, we define the pattern as the dependency parsing path between two entities.We assume this parsing path is sufficient to express relational signals (Peng et al., 2020).As illustrated in Figure 2, we first parse the original sentence (Chen and Manning, 2014) and obtain the path "NSUBJ-applies-DOBJ" between two entities "surgeon" and "splints" as pattern.To force the generative model to learn this pattern, we employ the Levenshtein (Lev) distance (Yujian and Bo, 2007) to find the target pattern that is closest to the original pattern and has the same relation with the original sentence, then the corresponding sentence will be chosen as the output during training.The Lev distance is a measure of the surface form similarity between two sequences, which is defined as the minimum number of operations (inserting, deleting, and replacing) required to convert the original sequence to the target sequence.We give a pseudo code implementation in the Appendix A.
For example, in Figure 2, the Lev distance between the pattern "NSUBJ-applies-DOBJ" and the pattern "NSUBJ-chosen-DOBJ" is 1, so the target output sentence is: In practice, we choose the target sentence with pattern less than λ (e.g.λ = 3) from the original sentence's pattern, where λ is a hyperparameter.
In this way, the decoder network is employed to predict the pattern approximation target sentence Y = (y 1 , y 2 , ..., y Ty ) by maximizing p(Y |H, θ P ): where θ P denotes the parameters of the decoder part for the original sentence pattern approximation.
N is the number of sentences for all outputs that satisfy the Lev distance less than 3.
Entity-level sentence generation.Furthermore, to generate more controllable entity-level sentences and help the generator to better mark entities in the augmented sentences, we add a subject or object Entity (E) from the target output sentence to the input embedding of the decoder as a hint.For example, in Figure 1, we add winemaker or grapes to the input of the decoder part θ P , which helps derive entity-oriented controllable sentence and increase the diversity of generated sentences by adopting different entity hints.Therefore, we finalize the loss function of Eq. 2 as: The overall loss function of multi-task learning is the sum of log probabilities of original sentence restructuring and pattern approximation tasks: where θ = (θ E , θ R , θ P ).θ E is the parameter of encoder part.In practice, we adopt an iterative strategy to train two complementary tasks.For each iteration, we first optimize the (θ E , θ R ) framework in the restructuring task for five epochs.The optimized θ E will be employed as the initial θ E of the pattern approximation task.Next, we optimize the (θ E , θ P ) framework for five epochs in the pattern approximation task, and the updated θ E will be used in the next iteration.Finally, θ E and θ P will be adopted to generate augmented sentences.

Generate pseudo sentences
After we obtain the trained seq2seq generator T5 (θ E , θ P ), which focuses on the reconstruction of diverse, semantically consistent, and coherent relational signals.We leverage the generator to generate entity-oriented pseudo sentences as augmented data.In practice, we randomly select an annotated sentence X and one marked subject or object entity E under the relation label to which the X belongs from the annotated data.Then we obtain the augmented sentence by (X, E, θ E , θ P ), where subject and object entities (one of them is E) have been marked during the generation process.
The augmented sentences have the same relation label as the original sentences and have enough diversity with different sentences and entity hints randomly sampled from the annotated data.

Experiments
We conduct extensive experiments on three public datasets and low-resource RE setting to show the effectiveness of GDA and give a detailed analysis to show its advantages.

Base Models and Baseline Introduction
We adopt two SOTA base models: (1) SURE (Lu et al., 2022) creates ways for converting sentences and relations that effectively fill the gap between the formulation of summarization and RE tasks.
(2) RE-DMP (Tian et al., 2022) leverages syntactic information to improve relation extraction by training a syntax-induced encoder on auto-parsed data through dependency masking.
We adopt three types of baseline models.
(1) Rule-Based Techniques: EDA (Wei and Zou, 2019) adopts synonym replacement, random insertion, random swap, and random deletion to augment the original sentences.Paraphrase Graph (Chen et al., 2020a) constructs a graph from the annotated sentences and creates augmented sentences by inferring labels from the original sentences using a transitivity property.
(2) Example Interpolation Techniques: Inspired by Mixup (Zhang et al., 2018), MixText (Chen et al., 2020b) and Ex2 (Lee et al., 2021) aim to interpolate the embeddings and labels of two or more sentences.BackMix (Jin et al., 2022) proposes a back-translation based method which softly mixes the multilingual augmented samples.

Datasets and Experimental Settings
Three classical datasets are used to evaluate our technique: the SemEval 2010 Task 8 (SemEval) (Hendrickx et al., 2010), the TAC Relation Extraction Dataset (TACRED) (Zhang et al., 2017), and the revisited TAC Relation Extraction Dataset (TACREV) (Alt et al., 2020) We train the T5-base (Raffel et al., 2020) with the initial parameter on the annotated data and utilize the T5 default tokenizer with max-length as 512 to preprocess data.We use AdamW with 5e−5 learning rate to optimize cross-entropy loss.Both GDA and all baseline augmentation methods augment the annotated data by 3x for fair comparison.For the low-resource RE setting, We randomly sample 10%, 25%, and 50% of the training data and use them for all data augmentation techniques.All augmented techniques can only be trained and augmented on these sampled data.

Main Results
Table 2 shows the average micro F1 results over three runs in three RE datasets.All base models could gain F1 performance improvements from the augmented data when compared with the models that only adopt original training data, which demonstrates the effectiveness of data augmentation tech-niques in the RE task.For three RE datasets, Text Gen is considered the previous SOTA data augmentation technique.The proposed GDA technique consistently outperforms all baseline data augmentation techniques in F1 performance (with student's T test p < 0.05).More specifically, compared to the previous SOTA: Text Gen, GDA on average achieves 0.5% higher F1 in SemEval, 0.5% higher F1 in TACRED, and 0.4% higher F1 in TACREV across various annotated data and base models.
Considering the low-resource relation extraction setting when annotated data are limited, e.g.50% for SemEval, TACRED and TACREV, GDA could achieve an average boost of 0.5% F1 compared to Text Gen.When less labeled data is available, 10% for SemEval, TACRED, and TACREV, the average F1 improvement is consistent, and surprisingly increased to 0.8%.We attribute the consistent improvement of GDA to the diverse and semantically consistent generated sentences that are exploited: we bootstrap the relational signals of the augmented data via multi-task learning, which could help generate entity-oriented sentences for relation extraction tasks.
To demonstrate the impact of different pretrained language models (PLMs) on the quality of augmented data, we present the PLMs adopted by GDA and baseline augmentation techniques and their corresponding parameters in Table 2.An exciting conclusion is that compared to Text Gen, although we use PLMs with fewer parameters (345M vs. 220M), our augmentation effect is still improved by an astonishing 0.6% compared to Text Gen, and a new SOTA for the RE task has been achieved.Even though we adopt T5-Small (60M) in GDA, which has fewer parameters than BERT-Base and GPT-2 (≈ 110M), the augmented data can still bring competitive F1 improvement.More specifically, GDA (T5-Small) can achieve F1 improvement of 0.9% and 1.1% on SURE and RE-DMP, respectively, which illustrates the effectiveness of GDA for data augmentation in RE task.

Ablation Study
We conduct an ablation study to show the effectiveness of different modules of GDA on test set.GDA w/o Restructuring is the proposed technique without the decoder part θ R and only uses the original sentence pattern approximation task to train the T5.GDA w/o Approximation is the proposed technique without the decoder part θ P and entity hint from the target sentence, and we use θ R for generation during both training/inference.GDA w/o Two Tasks directly fine-tunes T5 on the training data, only requiring that the input sentence to be from the same relation as the target sentence.
A general conclusion from the ablation results in Table 2 is that all modules contribute positively to GDA.More specifically, without multi-task learning framework, GDA w/o Two Tasks brings 1.3% less F1 performance averaged over three datasets.Similarly, compared with the restructuring task, the pattern approximation task can bring more average improvement in F1 performance (0.6% vs. 0.8%), which also means that we need to focus more on the pattern approximation task when training T5.

Generative Model Ablation Study
We additionally study the effect of removing the generative model on the augmentation effect, that is, we directly use restructured original sentences and pattern approximation target sentences as augmented sentences.From Table 3, we found that directly using restructured sentences and pattern approximation sentences as augmented data results in a 1.3% drop in F1 performance compared to GDA, which indicates the necessity of using T5-Base to train augmented sentences.These two augmented sentences are also rule-based techniques.Compared with other rule-based data augmentation techniques (EDA and Paraphrase Graph), they can bring an average F1 improvement of 0.4%, which additionally illustrates the effectiveness of our modification of the original sentences on the RE tasks.

Performance on Various Augmentation Multiples
We vary the multiple of augmented data from 2x to 10x the 10% training set to study the influence of data augmentation techniques for the base models under low-resource scenarios.We choose the 10% SemEval and 10% TACREV training datasets and the base models: SURE and RE-DMP, then represent the results on the test set in Figure 3.We observe that two base models have more performance gains with ever-increasing augmented data and GDA achieves consistently better F1 performance, with a clear margin, when compared with baseline data augmentation techniques under various multiples of the augmented data.Especially for 10% TACREV, GDA brings an incredible 3% improvement in F1 performance with only 4x augmented data, which is even 0.2% better than adopting 25% (2.5x) of the training data directly.

Diversity Evaluation
We measure the diversity of augmented sentences through automatic and manual metrics.For automatic metric, we introduce the Type-Token Ratio (TTR) (Tweedie and Baayen, 1998) to measure the ratio of the number of different words to the total number of words in the dependency path between two entities for each relation type.Higher TTR (%) indicates more diversity in sentences.Besides that, we ask 5 annotators to give a score for the degree of diversity of the 30 generated sentences for each relation type, with score range of 1~5.According to the annotation guideline in Appendix C, a higher score indicates the method can generate more diverse and grammatically correct sentences.We present the average scores for all relation types on three datasets in  versity capability similar to the rule-based methods.Furthermore, we give the detailed hyperparameter analysis in Appendix.

Case Study
We give two cases in Table 4. GDA adopts the entity hint: "program" and input sentence to generate a controllable target sentence, while retaining the original pattern: "was opened by" without changing the semantics.GDA w/o Pattern Approximation converts the rare pattern "was opened by" to the high frequency pattern "consists of" due to the inductive bias, which will affect the diversity of augmented sentences.GDA w/o Entity Hint will generate uncontrollable entities, resulting in the same sentence generated by the same relation, which affects the diversity of generated sentences.

Coherence Analysis
Compared to rule-based augmentation techniques, GDA conditionally generates pseudo sentences with entity hints, providing more coherent and reasonable sentences.We analyze the coherence of the augmented sentences through perplexity based on GPT-2 (Radford et al., 2019).Note that the exam- ple interpolation techniques interpolate the embeddings and labels of two or more sentences without the generation of specific sentences, so we did not compare these methods.
From Table 6, GDA could obtain the lowest average perplexity.Although Text Gen is also based on generative models, the augmented sentences are still not coherence enough due to the neglect of entity-level relational signals (entity hint) during the training process.Therefore, Text Gen is not so natural in generating augmented sentences with entity annotations.

Semantic Consistency Analysis
Unlike rule-based data augmentation techniques, which will change the semantics of the original sentence, GDA can better exploit relational signals: the target sentence during the training process comes from the restructured original sentence with the same relation label, so GDA can generate semantically consistent augmented sentences.
In our tasks, we first train SURE on the 100% training datasets and then apply GDA to the test set to obtain augmented sentences.We feed the 100 original sentences and 100 augmented sentences with the same relation labels into the trained SURE, and obtain the output representations from the last dense layer.We apply t-SNE (Van Der Maaten, 2014) to these embeddings and plot the visualization of the 2D latent space.From Figure 4, we observed that the latent space representations of the augmented sentences closely surrounded those of the original sentences with the same relation labels, indicating that GDA could augment sentences semantically consistently.Conversely, sentences augmented with rule-based method: EDA appear outliers, indicating semantic changes.

Conclusions and Future works
In this paper, we propose a relational text augmentation technique: GDA for RE tasks.Unlike conven- tional data augmentation techniques, our technique adopts the multi-task learning framework to generate diverse, coherent, and semantic consistent augmented sentences.We further adopt entity hints as prior knowledge for diverse generation.Experiments on three public datasets and low-resource settings could show the effectiveness of GDA.For future research directions, we can try to explore more efficient pre-ordering and parsing methods, and apply our data augmentation methods to more NLP applications, such as semantic parsing (Liu et al., 2022a(Liu et al., , 2023)), natural language inference (Li et al., 2023(Li et al., , 2022b)).

Limitations
We would like to claim our limitations from two perspectives: application-wise and technical-wise.
Application-wise: GDA needs annotations to finetune T5, which requires more computing resources and manual labeling costs than the rule-based techniques.
Technical-wise: Our "original sentence restructuring" and "original sentence pattern approximation" tasks rely on the efficiency and accuracy of pre-ordering rules (Wang et al., 2007) and parsing methods (Chen and Manning, 2014).Although current GDA show effectiveness, we still need to find more efficient pre-ordering and parsing methods.

Figure 1 :
Figure1: Overview of the proposed relational text augmentation technique with pattern approximation: GDA.We highlight the entities and pattern in the sentences.We define the pattern as the dependency parsing path between two entities.

Figure 2 :
Figure 2: Overview of the original sentence approximation task.We highlight the entities and pattern in the original and target sentence.

Figure 3 :
Figure 3: F1 results of the base model: SURE and RE-DMP with various multiples of the augmented data.

Figure 4 :
Figure 4: Latent space visualization of original and augmented sentences in the SemEval (left) and TACRED (right).The same relation labels use the same color.

•
We adopt GDA which leverages the multi-task learning framework to generate semantically consistent, coherent, and diverse augmented sentences for RE task.Furthermore, entity hints from target sentences are served to guide the generation of diverse sentences.•We validate the effectiveness of GDA on three public RE datasets and low-resource RE settings compared to other competitive baselines.
Chen et al. (2020a)(2018)e-based techniques adopt simple transform methods.Wei and Zou (2019)proposes to manipulate some words in the original sentences such as random swap, insertion, and deletion.Şahin and Steedman (2018)proposes to swap or delete children of the same parent in the dependency tree, which could benefit the original sentence with case marking.Chen et al. (2020a)constructs a graph from the original sentence pair labels and augment sentences by directly inferring labels with the transitivity property.Example Interpolation Techniques Example interpolation techniques such as MixText

Table 2 :
replaces the one-hot representation of a word by a distribution over the vocabulary and calculates it based on contextual information.Average micro F1 results over three runs in three RE datasets.† means we rerun the open source code and adopt the given parameters, ‡ means we product the code with the given parameters.We underline the best results among the baseline models.PLMs and Para.mean the pre-trained models and the corresponding parameters used.

Table 3 :
We adopt SURE as the base model and use 100% training data over three datasets.We report F1 results on the test sets."Restructured" and "Pattern" mean to directly use restructured original sentences and pattern approximation target sentences as augmentations.

Table 5 .
Since the example interpolation techniques do not generate the sentences shown, they are ignored.As a model-based augmentation technique, GDA could obtain 11.4% TTR and 0.4 diversity performance boost in average compared to Text Gen, and can even have a di-Original: The meeting was opened by the welcome speech of the Mayor of Komotini.Relation Label: Component-Whole Entity Hint: program GDA:The program was opened by the host, who was a former member of the Congressional Black Caucus.GDA w/o Pattern Approximation: The program consists of a seminar and a workshop.GDA w/o Entity Hint: The ricotta mixture was the best part of this dish.Original: This train has as many as six sets of doors.The kitchen has sets of stove, and a microwave.GDA w/o Pattern Approximation: The kitchen contents include a refrigerator.GDA w/o Entity Hint: The ricotta mixture was the best part of this dish.

Table 4 :
Case study.We highlight the entities and pattern in the original and generated sentences.

Table 5 :
Diversity Evaluation on three datasets.

Table 6 :
Perplexity of the augmented sentences in three datasets.Original means the original sentences.Lower perplexity is better.
1 func LevenshteinDistance(char s[1..m], char t [1..n]){ 2// for all i and j, d[i,j] will hold the Levenshtein distance between 3 // the first i characters of s and the first j