Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs

Paraphrase generation plays an essential role in natural language process (NLP), and it has many downstream applications. However, training supervised paraphrase models requires many annotated paraphrase pairs, which are usually costly to obtain. On the other hand, the paraphrases generated by existing unsupervised approaches are usually syntactically similar to the source sentences and are limited in diversity. In this paper, we demonstrate that it is possible to generate syntactically various paraphrases without the need for annotated paraphrase pairs. We propose Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence from a collection of unannotated texts. The disentanglement enables SynPG to control the syntax of output paraphrases by manipulating the embedding in the syntactic space. Extensive experiments using automatic metrics and human evaluation show that SynPG performs better syntactic control than unsupervised baselines, while the quality of the generated paraphrases is competitive. We also demonstrate that the performance of SynPG is competitive or even better than supervised models when the unannotated data is large. Finally, we show that the syntactically controlled paraphrases generated by SynPG can be utilized for data augmentation to improve the robustness of NLP models.


Introduction
Paraphrase generation (McKeown, 1983) is a longlasting task in natural language processing (NLP) and has been greatly improved by recently developed machine learning approaches and large data collections. Paraphrase generation demonstrates the potential of machines in semantic abstraction and sentence reorganization and has already been applied to many NLP downstream applications, Figure 1: Paraphrase generation with syntactic control. Given a source sentence and a target syntactic specification (either a full parse tree or top levels of a parse tree), the model is expected to generate a paraphrase with the syntax following the given specification. such as question answering (Yu et al., 2018), chatbot engines (Yan et al., 2016), and sentence simplification .
In recent years, various approaches have been proposed to train sequence-to-sequence (seq2seq) models on a large number of annotated paraphrase pairs (Prakash et al., 2016;Cao et al., 2017;Egonmwan and Chali, 2019). Some of them control the syntax of output sentences to improve the diversity of paraphrase generation (Iyyer et al., 2018;Goyal and Durrett, 2020;Kumar et al., 2020). However, collecting annotated pairs is expensive and induces challenges for some languages and domains. On the contrary, unsupervised approaches build paraphrase models without using parallel corpora (Li et al., 2018;Roy and Grangier, 2019;Zhang et al., 2019). Most of them are based on the variational autoencoder (Bowman et al., 2016) or back-translation Hu et al., 2019). Nevertheless, without the consideration of controlling syntax, their generated paraphrases are often similar to the source sentences and are not diverse in syntax.
This paper presents a pioneering study on syntactically controlled paraphrase generation based on disentangling semantics and syntax. We aim to disentangle one sentence into two parts: 1) the semantic part and 2) the syntactic part. The semantic aspect focuses on the meaning of the sentence, while the syntactic part represents the grammatical structure. When two sentences are paraphrased, their semantic aspects are supposed to be similar, while their syntactic parts should be different. To generate a syntactically different paraphrase of one sentence, we can keep its semantic part unchanged and modify its syntactic part.
Based on this idea, we propose Syntactically Controlled Paraphrase Generator (SynPG) 1 , a Transformer-based model (Vaswani et al., 2017) that can generate syntactically different paraphrases of one source sentence based on some target syntactic parses. SynPG consists of a semantic encoder, a syntactic encoder, and a decoder. The semantic encoder considers the source sentence as a bag of words without ordering and learns a contextualized embedding containing only the semantic information. The syntactic encoder embeds the target parse into a contextualized embedding including only the syntactic information. Then, the decoder combines the two representations and generates a paraphrase sentence. The design of disentangling semantics and syntax enables SynPG to learn the association between words and parses and be trained by reconstructing the source sentence given its unordered words and its parse. Therefore, we do not require any annotated paraphrase pairs but only unannotated texts to train SynPG.
We verify SynPG on four paraphrase datasets: ParaNMT-50M , Quora (Iyer et al., 2017), PAN (Madnani et al., 2012), and MRPC (Dolan et al., 2004). The experimental results reveal that when being provided with the syntactic structures of the target sentences, SynPG can generate paraphrases with the syntax more similar to the ground truth than the unsupervised baselines. The human evaluation results indicate that SynPG achieves competitive paraphrase quality to other baselines while its generated paraphrases are more accurate in following the syntactic specifications. In addition, we show that when the training data is large enough, the performance of SynPG is competitive or even better than supervised approaches. Finally, we demonstrate that the syntactically controlled paraphrases generated by SynPG can be 1 Our code and the pretrained models are available at https://github.com/uclanlp/synpg used for data augmentation to defense syntactically adversarial attack (Iyyer et al., 2018) and improve the robustness of NLP models.

Unsupervised Paraphrase Generation
We aim to train a paraphrase model without using annotated paraphrase pairs. Given a source sentence x = (x 1 , x 2 , ..., x n ), our goal is to generate a paraphrase sentence y = (y 1 , y 2 , ..., y m ) that is expected to maintain the same meaning of x but has a different syntactic structure from x. Syntactic control. Motivated by previous work (Iyyer et al., 2018;Zhang et al., 2019;Kumar et al., 2020), we allow our model to access additional syntactic specifications as the control signals to guide the paraphrase generation. More specifically, in addition to the source sentence x, we give the model a target constituency parse p as another input. Given the input (x, p), the model is expected to generate a paraphrase y that is semantically similar to the source sentence x and syntactically follows the target parse p. In the following discussions, we assume the target parse p to be a full constituency parse tree. Later on, in Section 2.3, we will relax the syntax guidance to be a template, which is defined as the top two levels of a full parse tree. We expect that a successful model can control the syntax of output sentences and generate syntactically different paraphrases based on different target parses, as illustrated in Figure 1.
Similar to previous work (Iyyer et al., 2018;Zhang et al., 2019), we linearize the constituency parse tree to a sequence. For example, the linearized parse of the sentence "He eats apples." is (S(NP(PRP))(VP(VBZ)(NP(NNS)))(.)). Accordingly, a parse tree can be considered as a sentence p = (p 1 , p 2 , ..., p k ), where the tokens in p are non-terminal symbols and parentheses.

Proposed Model
Our main idea is to disentangle a sentence into the semantic part and the syntactic part. Once the model learns the disentanglement, it can generate a syntactically different paraphrase of one given sentence by keeping its semantic part unchanged and modifying only the syntactic part. Figure 2 illustrates the proposed paraphrase model called SynPG, a seq2seq model consisting of a semantic encoder, a syntactic encoder, and a decoder. The semantic encoder captures only the semantic information of the source sentence x, Figure 2: SynPG embeds the source sentence and the target parse into a semantic embedding and a syntactic embedding, respectively. Then, SynPG generates a paraphrase sentence based on the two embeddings.
while the syntactic encoder extracts only the syntactic information from the target parse p. The decoder then combines the encoded semantic and syntactic information and generates a paraphrase y. We discuss the details of SynPG in the following.
The semantic embedding z sem is supposed to contain only the semantic information of the source sentence x. To separate the semantic information from the syntactic information, we use a Transformer (Vaswani et al., 2017) without the positional encoding as the semantic encoder. We posit that by removing position information from the source sentence x, the semantic embedding z sem would encode less syntactic information.
We assume that words without ordering capture most of the semantics of one sentence. Indeed, semantics is also related to the order. For example, exchanging the subject and the object of a sentence changes its meaning. However, the decoder trained on a large corpus also captures the selectional preferences (Katz and Fodor, 1963;Wilks, 1975) in generation, which enables the decoder to infer the proper order of words. In addition, we observe that when two sentences are paraphrased, they usually share similar words, especially those words related to the semantics. For example, "What is the best way to improve writing skills?" and "How can I improve my writing skills?" are paraphrased, and the shared words (improve, writing, and skills) are strongly related to the semantics. In Section 4, we show that our designed semantic embedding captures enough semantic information to generate paraphrases.
Since the target parse p contains no semantic information but only syntactic information, we use a Transformer with the positional encoding as the syntactic encoder.
Decoder. Finally, we design a decoder that takes the semantic embedding z sem and the syntactic embedding z syn as the input and generates a paraphrase y. In other words, y = (y 1 , y 2 , ..., y m ) = Dec(z sem , z syn ).
We choose Transformer as the decoder to generate y autoregressively. Notice that the semantic embedding z sem does not encode the position information and the syntactic embedding z syn does not contain semantics. This forces the decoder to extract the semantics from z sem and retrieve the syntactic structure from z syn . The attention weights attaching to z sem and z syn make the decoder learn the association between the semantics and the syntax as well as the relation between the word order and the parse structures. Therefore, SynPG is able to reorganize the source sentence and use the given syntactic structure to rephrase the source sentence.

Unsupervised Training
Our design of the disentanglement makes it possible to train SynPG without using annotated pairs. We train SynPG with the objective to reconstruct the source sentences. More specifically, when training on a sentence x, we first separate x into two parts: 1) an unordered word listx and 2) its linearized parse p x (can be obtained by a pretrained parser). Then, SynPG is trained to reconstruct x from (x, p x ) with the reconstruction loss Notice that if we do not disentangle the semantics and the syntax, and directly use a seq2seq model to reconstruct x from (x, p x ), it is likely that the seq2seq model only learns to copy x and ignores p x since x contains all the necessary information for the reconstruction. Consequently, at inference time, no matter what target parse p is given, the seq2seq model always copies the whole source sentence x as the output (more discussion in Section 4).
On the contrary, SynPG learns the disentangled embeddings z sem and z syn . This makes SynPG capture the relation between the semantics and the syntax to reconstruct the source sentence x. Therefore, at test time, given the source sentence x and a new target parse p, SynPG is able to apply the learned relation to rephrase the source sentence x according to the target parse p.
Word dropout. We observe that the ground truth paraphrase may contain some words not appearing in the source sentence; however, the paraphrases generated by the vanilla SynPG tend to include only words appearing in the source sentence due to the reconstruction training objective. To encourage SynPG to improve the diversity of the word choices in the generated paraphrases, we randomly discard some words from the source sentence during training. More precisely, each word has a probability to be dropped out in each training iteration. Accordingly, SynPG has to predict the missing words during the reconstruction, and this enables SynPG to select different words from the source sentence to generate paraphrases. More details are discussed in Section 4.5.

Templates and Parse Generator
In the previous discussion, we assume that a full target constituency parse tree is provided as the input to SynPG. However, the full parse tree of the target paraphrase sentence is unlikely available at inference time. Therefore, following the setting in Iyyer et al. (2018), we consider generating the paraphrase based on the template, which is defined as the top two levels of the full constituency parse tree. For example, the template of (S(NP(PRP))(VP(VBZ)(NP(NNS)))(.)) is (S(NP)(VP))(.)).
Motivated by Iyyer et al. (2018), we train a parse generator to generate full parses from templates.
The proposed parse generator has the same architecture as SynPG, but the input and the output are different. The parse generator takes two inputs: a tag sequence tag x and a target template t. The tag sequence tag x contains all the POS tags of the source sentence x. For example, the tag sequence of the sentence "He eats apples." is "<PRP> <VBZ> <NNS> <.>". Similar to the source sentence in SynPG, we do not consider the word order of the tag sequence during encoding. The expected output of the parse generator is a full parsep whose a syntactic structure follows the target template t.
We train the parse generator without any additional annotations as well. Let t x be the the template of p x (the parse of x), we end-to-end train the parse generator with the input being (tag x , t x ) and the output being p x .
Generating paraphrases from templates. The parse generator makes us generate paraphrases by providing target templates instead of target parses. The steps to generate a paraphrase given a source sentence x and a target template t are as follows: 1. Get the tag sequence tag x of the source sentence x.
2. Use the parse generator to generate a full parsẽ p with input (tag x , t).
3. Use SynPG to generate a paraphrase y with input (x,p).
Post-processing. We notice that certain templates are not suitable for some source sentences and therefore the generated paraphrases are nonsensical. We follow Iyyer et al. (2018) and use n-gram overlap and paraphrastic similarity computed by the model 2 from  to remove nonsensical paraphrases 3 .

Datasets
For the training data, we consider ParaNMT-50M , a paraphrase dataset containing over 50 million pairs of reference sentences and the corresponding paraphrases as well as the quality scores. We select about 21 million pairs with higher quality scores as our training examples. Notice that we use only the reference sentences to train SynPG and unsupervised paraphrase models since we do not require paraphrase pairs. We sample 6,400 pairs from ParaNMT-50M as the testing data. To evaluate the transferability of SynPG, we also consider the other three datasets: 1) Quora (Iyer et al., 2017) contains over 400,000 paraphrase pairs and we sample 6,400 pairs from them. 2) PAN (Madnani et al., 2012) contains 5,000 paraphrase pairs. 3) MRPC (Dolan et al., 2004) contains 2,753 paraphrase pairs.

Evaluation
We consider paraphrase pairs to evaluate all the models. For each test paraphrase pair (x 1 , x 2 ), we consider x 1 as the source sentence and treat x 2 as the target sentence (ground truth). Let p 2 be the parse of x 2 , given (x 1 , p 2 ), The model is expected to generate a paraphrase y that is similar to the target sentence x 2 .
We use BLEU score (Papineni et al., 2002) and human evaluation to measure the similarity between x 2 and y. Moreover, to evaluate how well the generated paraphrase y follows the target parse p 2 , we define the template matching accuracy (TMA) as follows. For each ground truth sentence x 2 and the corresponding generated paraphrase y, we get their parses (p 2 and p y ) and templates (t 2 and t y ). Then, we calculate the percentage of pairs whose t y exactly matches t 2 as the template matching accuracy.

Models for Comparison
We consider the following unsupervised paraphrase models: 1) CopyInput: a naïve baseline which directly copies the source sentence as the output without paraphrasing. 2) BackTrans: back-translation is proposed to generate paraphrases Hu et al., 2019). In our experiment, we use the pretrained EN-DE and DE-EN translation models 4 proposed by Ng et al. (2019)  Notice that training translation models requires additional translation pairs. Therefore, BackTrans needs more resources than ours and the translation data may not available for some low-resource languages. 3) VAE: we consider a vanilla variational autoencoder (Bowman et al., 2016) as a simple baseline. 4) SIVAE: syntax-infused variational autoencoder (Zhang et al., 2019) utilizes additional syntax information to improve the quality of sentence generation and paraphrase generation. Unlike SynPG, SIVAE does not disentangle the semantics and syntax. 5) Seq2seq-Syn: we train a seq2seq model with Transformer architecture to reconstruct x from (x, p x ) without the disentanglement. We use this model to study the influence of the disentanglement. 6) SynPG: our proposed model which learns disentangled embeddings.
We also compare SynPG with supervised approaches. We consider the following: 1) Seq2seq-Sup: a seq2seq model with Transformer architecture trained on whole ParaNMT-50M pairs. 2) SCPN: syntactically controlled paraphrase network (Iyyer et al., 2018) is a supervised paraphrase model with syntactic control trained on ParaNMT-50M pairs. We use their pretrained model 5 .

Implementation Details
We consider byte pair encoding (Sennrich et al., 2016) for tokenization and use Stanford CoreNLP parser  to get constituency parses. We set the max length of sentences to 40 and set the max length of linearized parses to 160 for all the models. For the encoders and the decoder of SynPG, we use the standard Transformer (Vaswani et al., 2017) with default parameters. The word embedding is initialized by GloVe (Pennington et al., 2014). We use Adam optimizer with the learning rate being 10 −4 and the weight decay being 10 −5 . We set the word dropout probability to 0.4 (more discussion in Section 4.5). The number of epoch for training is set to 5.
Seq2seq-Syn, Seq2seq-Sup are trained with the similar setting. We reimplemnt VAE and SIVAE, and all the parameters are set to the default value in the original papers.

Syntactic Control
We first discuss if the syntactic specification enables SynPG to control the output syntax better.   Table 1 shows the template matching accuracy and BLEU score for SynPG and the unsupervised baselines. Notice that here we use the full parse trees as the syntactic specifications. We will discuss the influence of using the template as the syntactic specifications in Section 4.3.
Although we train SynPG on the reference sentences of ParaNMT-50M, we observe that SynPG performs well on Quora, PAN, and MRPC as well. This validates that SynPG indeed learns the syntactic rules and can transfer the learned knowledge to other datasets. CopyInput gets high BLEU scores; however, due to the lack of paraphrasing, it obtains low template matching scores. Compared to the unsupervised baselines, SynPG achieves higher template matching accuracy and higher BLEU scores on all datasets. This verifies that the syntactic specification is indeed helpful for syntactic control.
Next, we compare SynPG with Seq2seq-Syn and SIVAE. All models are given syntactic specifications; however, without the disentanglement, Seq2seq-Syn and SIVAE tend to copy the source sentence as the output and therefore get low template matching scores. Table 2 lists some paraphrase examples generated by all models. Again, we observe that without syntactic specifications, the paraphrases generated by unsupervised baselines are similar to the source sentences. Without the disentanglement, Seq2seq-Syn and SIVAE always copy the source sentences. SynPG is the only model can generate paraphrases syntactically similar to the ground truths.

Human Evaluation
We perform human evaluation using Amazon Mechanical Turk to evaluate the quality of generated paraphrases. We follow the setting of previous work (Kok and Brockett, 2010;Iyyer et al., 2018;Goyal and Durrett, 2020). For each model, we randomly select 100 pairs of source sentence x and the corresponding generated paraphrase y from ParaNMT-50M test set (after being post-processed as mentioned in Section 2.3) and have three Turkers annotate each pair. The annotations are on a three-point scale: 0 means y is not a paraphrase of x; 1 means x is paraphrased into y but y contains some grammatical errors; 2 means x is paraphrased into y, which is grammatically correct.
The results of human evaluation are reported in Table 3. If paraphrases rated 1 or 2 are considered meaningful, we notice that SynPG generates meaningful paraphrases at a similar frequency to that of SIVAE. However, SynPG tends to generate more ungrammatical paraphrases (those rated 1). We think the reason is that most of paraphrases generated by SIVAE are very similar to the source sentences, which are usually grammatically correct. On the other hand, SynPG is encouraged to  Table 3: Human evaluation on a three-point scale (0 = not a paraphrase, 1 = ungrammatical paraphrase, 2 = grammatical paraphrase). SynPG performs better on hit rate (defined as the percentage of generated paraphrase getting 2 and matching the target parse at the same time) than other unsupervised models.
use different syntactic structures from the source sentences to generate paraphrases, which may lead some grammatical errors. Furthermore, we calculate the hit rate, the percentage of generated paraphrases getting 2 and matching the target parse at the same time. The hit rate measures how often the generated paraphrases follow the target parses and preserve the semantics (verified by human evaluation) simultaneously. The results show that SynPG gets higher hit rate than other models.

Target Parses vs. Target Templates
Next, we discuss the influence of generating paraphrase by using templates instead of using full parse trees. For each paraphrase pair (x 1 , x 2 ) in test data, we consider two ways to generate the paraphrase. 1) Generating the paraphrase with the target parse. We use SynPG to generate a paraphrase directly from (x 1 , p 2 ). 2) Generating the paraphrase with the target template. We first use the parse generator to generate a parsep from (tag 1 , t 2 ), where tag 1 is the tag sequence of x 1 and t 2 is the template of p 2 . Then we use SynPG to generate a paraphrase from (x 1 ,p). We calculate the template matching accuracy to compare these two ways to generate paraphrases, as shown in Table 4. We also report the template matching accuracy of the generated parsep.
We find that most of generated parsesp indeed follow the target templates, which means that the parse generator usually generates good parsesp. Next, we observe that generating paraphrases with target parses usually performs better than with target templates. The results show a trade-off. Using templates proves more effortless during the generation process, but may compromise the syntactic control ability. In comparison, by using the target parses, we have to provide more detailed parses, but our model can control the syntax better.
Another benefit of generating paraphrase with  Table 4: Influence of using templates. Using templates proves more effortless during the generation process, but may compromise the syntactic control ability.
target templates is that we can easily generate a lot of syntactically different paraphrases by feeding the model with different templates. Table 5 lists some paraphrases generated by SynPG with different templates. We can perceive that most generated paraphrases are grammatically correct and have similar meanings to the original sentence.

Training SynPG on Larger Dataset
Finally, we demonstrate that the performance of SynPG can be further improved and be even competitive to supervised models on some datasets if we consider more training data. The advantage of unsupervised paraphrase models is that we do not require parallel pairs for training. Therefore, we can easily boost the performance of SynPG by consider more unannotated texts into training. We consider SynPG-Large, the SynPG model trained on the reference sentences of ParaNMT-50M as well as One Billion Word Benchmark (Chelba et al., 2014), a large corpus for training language models. We sample about 24 million sentences from One Billion Word and add them to the training set. In addition, we fine-tune SynPG-Large on only the reference sentences of the testing paraphrase pairs, called SynPG-FT.
From Table 6, We observe that enlarging the training data set indeed improves the performance. Also, with the fine-tuning, the performance of SynPG can be much improved and even is better than the performance of supervised models on some datasets. The results demonstrate the potential of unsupervised paraphrase generation with syntactic control.

Word Dropout Rate
The word dropout rate plays an important role for SynPG since it controls the ability of SynPG to generate new words in paraphrases. We test differ-     Figure 3: Influence of word drop out rate. Setting the word dropout rate to 0.4 can achieve the best BLEU score. However, higher word dropout rate leads to better template matching accuracy. ent word dropout rates and report the BLEU scores and the template matching accuracy in Figure 3. From Figure 3a, we can observe that setting the word dropout rate to 0.4 can achieve the best BLEU score in most of datasets. The only exception is ParaNMT, which is the dataset used for training. On the other hand, Figure 3b shows that higher word dropout rate leads to better template matching accuracy. The reason is that higher word dropout rate gives SynPG more flexibility to generate paraphrases. Therefore, the generated paraphrases can match the target syntactic specifications better. However, higher word dropout rate also make SynPG have less ability to preserve the meaning of source sentences. Considering all the factors above, we recommend to set the word dropout rate to 0.4 for SynPG.

Improving Robustness of Models
Recently, a lot of work show that NLP models can be fooled by different types of adversarial attacks (Alzantot et al., 2018;Ebrahimi et al., 2018;Iyyer et al., 2018;Tan et al., 2020;Jin et al., 2020). Those attacks generate adversarial examples by slightly modifying the original sentences without changing the meanings, while the NLP models change the predictions on those examples. However, a robust model is expected to output the same labels. Therefore, how to make NLP models not affected by the adversarial examples becomes an important task.
Since SynPG is able to generate syntactically different paraphrases, we can improve the robustness of NLP models by data augmentation. The models trained with data augmentation are thus more robust to the syntactically adversarial examples (Iyyer et al., 2018), which are the adversarial sentences that are paraphrases to the original sen-  tences but with syntactic difference.
We conduct experiments on three classification tasks covered by GLUE benchmark (Wang et al., 2019): SST-2, MRPC, and RTE. For each training example, we use SynPG to generate four syntactically different paraphrases and add them to the training set. We consider the setting to generate syntactically adversarial examples by SCPN (Iyyer et al., 2018). For each testing example, we generate five candidates of adversarial examples. If the classifier gives at least one wrong prediction on the candidates, we treat the attack to be successful. We compare the model without data augmentation (Base) and with data augmentation (SynPG) in Table 7. We observe that with the data augmentation, the accuracy before attacking is slightly worse than Base. However, after attacking, the percentage of examples changing predictions is much less than Base, which implies that data augmentation indeed improves the robustness of models.

Related Work
Paraphrase generation. Traditional approaches usually require hand-crafted rules, such as rulebased methods (McKeown, 1983), thesaurus-based methods (Bolshakov and Gelbukh, 2004;Kauchak and Barzilay, 2006), and lattice matching methods (Barzilay and Lee, 2003). However, the diversity of their generated paraphrases is usually limited.
Recently, neural models make success on paraphrase generation (Prakash et al., 2016;Cao et al., 2017;Egonmwan and Chali, 2019;Gupta et al., 2018). These approaches treat paraphrase generation as a translation task and design seq2seq models based on a large amount of parallel data. To reduce the effort to collect parallel data, unsupervised paraphrase generation has attracted attention in recent years. Wieting et al. (2017);  use translation models to generate paraphrases via back-translation. Zhang et al. (2019); Roy and Grangier (2019) generate paraphrases based on variational autoencoders. Reinforcement learning techniques are also considered for paraphrase generation (Li et al., 2018).

Conclusion
We present syntactically controlled paraphrase generator (SynPG), an paraphrase model that can control the syntax of generated paraphrases based on the given syntactic specifications. SynPG is designed to disentangle the semantics and the syntax of sentences. The disentanglement enables SynPG to be trained without the need for annotated paraphrase pairs. Extensive experiments show that SynPG performs better syntactic control than unsupervised baselines, while the quality of the generated paraphrases is competitive to supervised approaches. Finally, we demonstrate that SynPG can improve the robustness of NLP models by generating additional training examples. SynPG is especially helpful for the domain where annotated paraphrases are hard to obtain but a large amount of unannotated text is available. One limitation of SynPG is the need for mannually providing target syntactic templates at inference time. We leave the automatic template generation as our future work.