Unsupervised Syntactically Controlled Paraphrase Generation with Abstract Meaning Representations

Syntactically controlled paraphrase generation has become an emerging research direction in recent years. Most existing approaches require annotated paraphrase pairs for training and are thus costly to extend to new domains. Unsupervised approaches, on the other hand, do not need paraphrase pairs but suffer from relatively poor performance in terms of syntactic control and quality of generated paraphrases. In this paper, we demonstrate that leveraging Abstract Meaning Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation. Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), separately encodes the AMR graph and the constituency parse of the input sentence into two disentangled semantic and syntactic embeddings. A decoder is then learned to reconstruct the input sentence from the semantic and syntactic embeddings. Our experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches. We also demonstrate that the paraphrases generated by AMRPG can be used for data augmentation to improve the robustness of NLP models.


Introduction
Syntactically controlled paraphrase generation approaches aim to control the format of generated paraphrases by taking into account additional parse specifications as the inputs, as illustrated by Figure 1.It has attracted increasing attention in recent years since it can diversify the generated paraphrases and benefit a wide range of NLP applications (Iyyer et al., 2018;Huang and Chang, 2021;Sun et al., 2021), including task-oriented dialog generation (Gao et al., 2020), creative generation (Tian et al., 2021), and model robustness (Huang and Chang, 2021).Recent works have shown success in training syntactically controlled paraphrase generators (Iyyer et al., 2018;Chen et al., 2019;Kumar et al., 2020;Sun et al., 2021).Although their models can generate high-quality paraphrases and achieve good syntactic control ability, the training process needs a large amount of supervised data, e.g., parallel paraphrase pairs.Annotating paraphrase pairs is usually expensive because it requires intensive domain knowledge and high-level semantic understanding.Due to the difficulty in collecting parallel data, the ability of supervised approaches are limited, especially when adapting to new domains.
To reduce the annotation demand, unsupervised approaches can train syntactically controlled paraphrase generators without the need for parallel pairs (Zhang et al., 2019;Bao et al., 2019;Huang and Chang, 2021).Most of them achieve syntactic control by learning disentangled embeddings for semantics and syntax separately (Bao et al., 2019;Huang and Chang, 2021).However, without parallel data, it is challenging to learn a good disentanglement and capture semantics well.As we will show later (Section 4.1), unsupervised approaches can generate bad paraphrases by mistakenly swapping object and subject of a sentence.
In this work, we propose to use Abstract Meaning Representations (AMR) (Banarescu et al., 2013) to learn better disentangled semantic embeddings for unsupervised syntactically controlled paraphrase generation.AMR is a semantic graph structure that covers the abstract meaning of a sentence.As shown in Figure 2, two sentences would have the same (or similar) AMR graph as long as they carry the same abstract meaning, even they are expressed with different syntactic structures.This property makes AMRs a good resource to capture sentence semantics.
Based on this, we design an AMR-enhanced Paraphrase Generator (AMRPG), which separately learns (1) semantic embeddings with the AMR garphs extracted from the input sentence and (2) syntactic embeddings from the constituency parse of the input sentence.Then, AMRPG trains a decoder to reconstruct the input sentence from the semantic and syntactic embeddings.The reconstruction objective and the design of the disentanglement of semantics and the syntax makes AMRPG learn to generate syntactically controlled paraphrases without using parallel pairs.Our experiments show that AMRPG performs better syntactic control than existing unsupervised approaches.Additionally, we demonstrate that the generated paraphrases of AMRPG can be used for data augmentation to improve the robustness of NLP models.

Unsupervised Syntactically Controlled
Paraphrase Generation

Problem Formulation
We follow previous works (Iyyer et al., 2018;Huang and Chang, 2021) and consider constituency parses (without terminals) as the control signals.
Given a source sentence s and a target parse p, the goal of the syntactically controlled paraphrase generator is to generate a target sentence t which has similar semantics to the source sentence s and has syntax following the parse p.In the unsupervised setting, the paraphrase generator cannot access any target sentences and target parses but only the source sentences and source parses during training.

Proposed Method: AMRPG
Motivated by previous approaches (Bao et al., 2019;Huang and Chang, 2021), we design AM-RPG to learn separate embeddings for semantics and syntax, as illustrated by Figure 3.Then, AM-RPG learns a decoder with the objective to reconstruct the source sentence.The challenge here is how to learn embeddings such that the semantic embedding contains only semantic information while the syntactic embedding contains only syntactic information.We introduce the details as follows.
Semantic embedding.Given a source sentence, we first use a pre-trained AMR parser1 to get its AMR graph.Next, we use a semantic encoder to encode the AMR graph into the semantic embedding e sem .Specifically, the semantic encoder consists of two parts: a fixed pre-trained AMR encoder (Ribeiro et al., 2021) followed by a learnable Transformer encoder.We additionally perform node masking when training the semantic encoder.Specifically, every node in the AMR graph has a probability to be masked out during training.This can improve the robustness of AMRPG.
As mentioned above, two semantically similar sentences would have similar AMR graphs regardless of their syntax.This property encourages AMRPG to capture only semantic information in semantic embeddings.Compared with previous work (Huang and Chang, 2021), which uses bagof-words to learn the semantic embeddings, using AMR can capture semantics better and lead to better performance, as shown in Section 4.
Syntactic embedding.Given a source sentence, we use the Stanford CoreNLP toolkit (Manning et al., 2014) to get its constituency parse.Then, we remove all the terminals in the parse and learns a Transformer encoder to encode the parse into the syntactic embedding e syn .Since we remove the terminals, the syntactic embedding contains only the syntactic information of the source sentence.
Decoder.We train a Transformer decoder that takes the semantic embedding e sem and the syntactic embedding e syn as the input, and reconstructs the source sentence with a cross-entropy loss.The reconstruction objective makes AMRPG not require parallel paraphrase pairs for training.
Inference.Given a source sentence s and a target parse p, we use the semantic encoder to encode the AMR graph of s into the semantic embedding, use the syntactic encoder to encode p into the syntactic embedding, and use the decoder to generate the target sentence t.

Syntactically Controlled Paraphrase Generation
Datasets.We consider ParaNMT (Wieting and Gimpel, 2018) for training and testing.We use only the source sentences in ParaNMT to train AMRPG and other unsupervised baselines, and use both the source sentences and target sentences to train supervised baselines.To further test the model's ability to generalize to new domains, we directly use the models trained with ParaNMT to test on Quora (Iyer et al., 2017), MRPC (Dolan et al., 2004), and PAN (Madnani et al., 2012) Evaluation metrics.Following the previous work (Huang and Chang, 2021), we consider the BLEU score to measure the similarity between the gold target sentences and the predicted target sentences, and consider the template matching accuracy2 (TMA) to evaluate the goodness of syntactic control.More details about the evaluation can be found in Appendix B.2.

Input
The dog chased the cat on the street.

SynPG
The dog was chased by the cat on the street.

AMRPG
The cat was chased by a dog in the street.

Input
John will send a gift to Tom when Christmas comes.

SynPG
When Tom comes, John will send a gift to Christmas.

AMRPG
When Christmas comes, John will send a gift to Tom.
Table 2: Paraphrase examples generated by SynPG and AMRPG.AMRPG captures semantics better and generates higher quality of paraphrases than SynPG.
state-of-the-art unsupervised model, with a large gap in terms of BLEU score.This justifies that using AMR can learn better disentangled embeddings and capture semantics better.
We observe that there is indeed a performance gap between AMRPG and SCPN (supervised baseline).However, since AMRPG is an unsupervised model, it is possible to use the source sentences from the target domains to further fine-tune AM-RPG without additional annotation cost.As shown in the table, AMRPG with further fine-tuning can achieve even better performance than SCPN when considering domain adaptation (Quora, MRPC, and PAN).This demonstrates the flexibility and the potential of unsupervised paraphrase models.
Qualitative examples.Table 2 lists some paraphrases generated by SynPG and AMRPG.As we mentioned in Section 3, SynPG uses bag-of-words to learn semantic embeddings and therefore SynPG is easy to get confused about the relations between entities or mistake the subject for the object.In contrast, AMRPG can preserve more semantics.

Improving Robustness of NLP Models
We demonstrate that the paraphrases generated by AMRPG can improve the robustness of NLP models by data augmentation.Following the setting of previous work (Huang and Chang, 2021), we consider three classification tasks in GLUE (Wang et al., 2019): MRPC, RTE, and SST-2.We compare three baselines: ( 1 Table 3 shows the clean accuracy and the broken rate (the percentage of examples being attacked) after attacked by the syntactically adversarial examples3 generated with SCPN (Iyyer et al., 2018).Although the classifiers trained with data augmen- tation have slightly worse clean accuracy, they have significantly lower broken rates, which implies that data augmentation improves the model robustness.Also, data augmentation with AMRPG performs better than data augmentation with SynPG in terms of the broken rate.We attribute this to the better quality of paraphrase generation of AMRPG.

Conclusion
We propose AMRPG that utilizes AMR to learn a better disentanglement of semantics and syntax without using any parallel data.This enables AM-RPG to captures semantics better and generate more accurate syntactically controlled paraphrases than existing unsupervised approaches.We also demonstrate that how to apply AMRPG to improve the robustness of NLP models.

Limitations
Our goal is to demonstrate the potential of AMR for syntactically controlled paraphrase generation.The current experimental setting follows previous works (Iyyer et al., 2018;Huang and Chang, 2021), which considers the full constituency parses as the control signals.In real applications, getting full constituency parses before the paraphrase generation process might take additional efforts.One potential solution is to consider relatively noisy or simplified parse specifications (Sun et al., 2021).
In addition, some parse specifications can be inappropriate for certain source sentences (e.g., the source sentence is long but the target parse is short).
How to score and reject some of the given parse specifications is still an open research question.Finally, although training AMRPG does not require any parallel paraphrase pairs, it does require a pretrained AMR parser, which can be a potential cost for training AMRPG.

Broader Impacts
Our proposed method focuses on improving syntactically controlled paraphrase generation.It is intended to be used to improve the robustness of models and facilitate language generation for applications with positive social impacts.All the experiments are conducted on open benchmark datasets.However, it is known that the models trained with a large text corpus could capture the bias reflecting the training data.It is possible for our model to potentially generate offensive or biased content learned from the data.We suggest to carefully examining the potential bias before deploying models in any real-world applications.

B.2 Evaluation
Following previous work (Huang and Chang, 2021), we consider paraphrase pairs to evaluate the performance.Given a paraphrase pairs (s 1 , s 2 ), we use the Standford CoreNLP constituency parser (Manning et al., 2014) to get their parses (p 1 , p 2 ).
The input of all baselines would be (s 1 , p 2 ) and the ground truth would be s 2 .
Assuming the generated paraphrase is g, We use BLEU score to measure the similarity between the generated paraphrase g and the ground truth s 2 .We also calculate the template matching accuracy (TMA) by computing the exact matching accuracy of the top-2 levels of p g and p 2 (p g is the constituency parse of g).
Then, we use the generated full parses as the parse specifications to generate paraphrases for data augmentation.When training classifiers with data augmentation, the original instances have four times of weights as the augmented instances when computing the loss.We use the scripts from Huggingface5 with default values to train the classifiers.

C.2 Generating Adversarial Examples
We use the official script6 of SCPN (Iyyer et al., 2018) to generate syntactically adversarial examples.Specifically, we consider the first five parse templates for RTE and SST-2 and first three parse templates for MRPC to generate the adversarial examples.As long as one of the adversarial examples makes the classifier change the prediction, we count it as a successful attack on this instance.

Figure 1 :
Figure 1: An illustration of syntactically controlled paraphrase generation.Given a source sentence and different parse specifications, the model generates different paraphrases following the parse specifications.

Figure 2 :
Figure2: The same AMR graph for a pair of paraphrased sentences "He described her as a genius."and "She was a genius, according to his description."

Figure 3 :
Figure 3: AMRPG's framwork.It separately encodes the AMR graph and the constituency parse of the input sentence into two disentangled semantic and syntactic embeddings.A decoder is then learned to reconstruct the input sentence from the semantic and syntactic embeddings.
) the classifier trained with original training data, (2) the classifier trained with original training data and augmented data generated by SynPG, and (3) the classifier trained with original training data and augmented data generated by AMRPG.Specifically, for every instance in the original training data, we generate four paraphrases as the augmented examples by considering four common syntactic templates.More details can be found in Appendix C.1.

Table 1 :
Table1shows the results of syntactically controlled paraphrase generation.AMRPG performs the best among the unsupervised approaches.Specifically, AMRPG outperforms SynPG, the Results of syntactically controlled paraphrase generation.AMRPG performs the best among all unsupervised approaches and can outperform supervised approaches when considering the target domain source sentences.

Table 3 :
Augmenting paraphrases generated by AM-RPG improves the robustness of NLP models.Acc denotes the clean accuracy (the higher is the better).Brok denotes the percentage of examples being successfully attacked (the lower is the better).