AMR-TST: Abstract Meaning Representation-based Text Style Transfer

Abstract Meaning Representation (AMR) is a semantic representation that can enhance nat-ural language generation (NLG) by providing a logical semantic input. In this paper, we pro-pose the AMR-TST, an AMR-based text style transfer (TST) technique. The AMR-TST converts the source text to an AMR graph and generates the transferred text based on the AMR graph modified by a TST policy named style rewriting. Our method combines both the ex-plainability and diversity of explicit and implicit TST methods. The experiments show that the proposed method achieves state-of-the-art results compared with other baseline models in automatic and human evaluations. The generated transferred text in qualitative evaluation proves the AMR-TST have significant advantages in keeping semantic features and reducing hallucinations. To the best of our knowledge, this work is the first to apply the AMR method focusing on node-level features to the TST task.


Introduction
Text style transfer (TST) is an attractive task in natural language processing, which aims to change the specific style by editing while preserving the core content of source texts.TST has been widely applied in tasks such as sentiment transfer, formality transfer, and political transfer (Jin et al., 2022;Shi et al., 2021).The lack of parallel corpus is the main challenge of the current TST tasks, making the methods based on the unsupervised generative structures that distinguish content and style features become the dominant technology.However, the entanglement of content and style features makes it difficult for these methods to balance the diversity and semantic reliability of the transferred text generation (Ramesh Kashyap et al., 2022).
Abstract Meaning Representation (AMR, Banarescu et al. 2013) is a semantic representation Figure 1: Sentences with the same semantics but different surface syntax can be parsed into the same AMR graph language, which comprises the whole sentence into a rooted, labelled, directed, acyclic graph.AMR graphs can be represented by PENMAN (Goodman, 2020) symbols, and texts with the same semantic meaning can be abstracted into the same AMR graph, an example is shown in Figure 1.This characteristic allows the model to generate various texts that maintain the same semantics logic based on a constant AMR graph.Compared with other meaning representation methods, AMR allows better maintenance of sentence backbones to describe phenomena such as parameter sharing and allows adding implicit or omitted constituents to recover full sentence semantics (Socher et al., 2013).More importantly, recent research has demonstrated that robust and diverse text generation can be achieved by modifying the nodes of AMR without the complex decoder retraining process (Shou et al., 2022).
This paper proposes the AMR-TST, a novel AMR-based generative text style transfer method.AMR-TST takes the AMR graphs as the intermediate representations and generates the trans-ferred text by the style rewriting algorithm that modifies the detected AMR graphs' stylistic nodes from source to target style.This design overcomes the difficulty in previous TST methods of generating diverse transferred texts combined with target words while maintaining factual consistency of non-stylistic content.It performs well by jointly considering the sentence-level features representing the semantic logical and node-level features representing the stylistic entities, and enables AMR-TST to adaptively embed target style words into the semantic structure of the source text to generate the reasonable and readable transferred text.Meanwhile, the parsing process of the AMR graph can realize the screening of the core content entities with semantic features of the source text, avoiding hallucinations caused by semantically irrelevant content in the transferred text generation process.
The structure of the AMR-TST is shown in To demonstrate the effectiveness of the proposed AMR-TST, we evaluate it using two public datasets, Yelp and Amazon, which are commonly employed for sentiment transfer tasks -one of the typical application scenarios in TST.All the evaluation results demonstrate that the AMR-TST achieves state-of-the-art results compared with other baseline models.To the best of our knowledge, AMR-TST is the first work to apply AMR to the TST task by rewriting node-level stylistic features.

Related Work
Text style transfer aims to revise the specific styles or attributes of the source texts while preserving the non-stylistic content (Hu et al., 2022).Due to the lack of a parallel corpus, implicit and explicit unsupervised methods are the mainstream techniques for this task (Jin et al., 2022).The implicit methods enable the model to map the text to the latent space through the encoder to obtain the disentangled representation, separate the content and attributes, and perform attribute transfer.Hu et al. (Hu et al., 2017) combined VAE with an attribute discriminator to control the attributes of the target sentence through structured encoding and provided feedback to optimize the generated sentence through the attribute discriminator.Luo et al. (Luo et al., 2019) regarded the mapping between the source and target text as a dual learning task and achieved style transfer by setting reward mechanisms of style accuracy and content retention in reinforcement learning.
Considering the fact that the style features of a sentence are usually reflected in unique phrases, explicit methods can achieve explainable text style transfer by only changing the stylistic words or phrases while retaining the style-independent parts.Li et al. (Li et al., 2018) first proposed the DRG framework, which achieves style transfer by deleting style words from texts, retrieving target texts similar to the source content, and generating target texts by combining target style features.Since sentiment words with higher attention weights in sentiment classification, Xu et al. (Xu et al., 2018) used an attention-based classifier to separate content and sentiment words for text style transfer.
Most research on AMR focuses on AMR parsing and generation, such as using graph neural networks (Bai et al., 2022) and pre-training language models (Xu et al., 2021a) to improve performance.It is gratifying to note that more recent research integrates AMR with downstream NLG tasks.T-STAR (Jangra et al., 2022) is a contemporaneous work with us, which transfers the text's style by training style-specific AMR encoder and decoder.In comparison, the AMR-TST achieves the text style transfer with a simple and reliable style rewriting algorithm, avoiding potential semantic bias during the complex retraining process.Kapanipathi et al. (Kapanipathi et al., 2021) introduced the AMR into knowledge base question answering (KBQA) for delegating the complexity of understanding natural language questions to AMR parsers, which relieves the pressure of labelling large amounts of data in KBQA.All this research proves the potential of AMR to power various NLP tasks.

Text to AMR
Text to AMR is the first component of AMR-TST.Let x s src = {x 1 , ..., x n } be the source text with the style of s src , and this component aims to parse the x s src into the corresponding AMR graph G s src .Previous text-to-AMR semantic parsing methods are fine-grained, content-specific heuristics that require complex pre-and post-processing, making them difficult to apply directly to cross-domain and genre-specific tasks.The pre-trained transformerbased sequence-to-sequence (seq2seq) model powered the AMR parsing tasks by their robust performance in transfer learning (Xu et al., 2020).In this paper, we applied SPRING1 (Bevilacqua et al., 2021) as the AMR parser, which is a BART-based (Lewis et al., 2020) model that achieves competitive performance in AMR semantic parsing.
SPRING extends its tokenization vocabulary by adding the frequently occurring relations, frames, and constituents of AMR tokens to make BART applicable for processing AMR graphs.The embeddings of the new symbols are included by a vector initialized by the average of word embeddings.Then the produced sequence can be transferred to the PENMAN notations after restoring parenthesis parity and removing the discontinuity token.
Specifically, SPRING first applies a complete graph isomorphic linearization technique to encode an AMR graph as a sequence of symbols via a DFS-based PENMAN annotation without losing adjacency information.The lack of a clear distinction between the constants and variables may confuse the seq2seq models.Since the variable names are without semantics, SPRING proposed a series of special tokens < R0 >, < R1 >, ..., < Rn > to represent the variables in the linearized graph and to handle co-referring nodes.This representation also disposes of the redundant slash token "/".Through this setting, the AMR graph in Figure 1 can be represented as (<R0> and:op1 (<R1> love-01:ARG0 (<R2> i):ARG1 (<R3> place:mod (<R4> this))):op2 (<R5> great:domain (<R6> serve-01):time (<R7> always))).The SPRING here is pre-trained on AMR 3.0 (LDC2020T02)2 .

Style Detector
Let a s src = {a 1 , ..., a m } ∈ x s src be the stylistic words in x s src , and the style detector aims to detect a s src significantly contributes to s src .Specifically, we applied the RoBERTa (Ott et al., 2019), a Transformer-based model that achieves state-ofthe-art results in several text classification tasks as our style classifier.The RoBERTa-based style classification process can be expressed as Eq. 1.
where v is a tensor such that v i is encoded x i ; α i is the corresponding attention weight in determining probabilities of each style label s over the whole style label set S = {s src , s tgt }, which can be understood as an importance score to detect style words.However, the Style Detector has multiple attention heads and layers that encode different semantic and linguistic structures.Inspired by Sudhakar et al. (Sudhakar et al., 2019), we calculate and extract the specific attention head and layer that can significantly encode the style features representing the importance score contributing to the style classification results.
Let < h, l > be the potential head-layer pair to be iterated to extract the attention score of the i th word x i ∈ x.The calculation is as Eq. 2.
where "[CLS]" is a special token added to each sentence's beginning.This symbol, without obvious semantic information, will more fairly integrate the semantic information of each word, thus better representing the semantics of the whole sentence."Q" and "K" are the query and key vectors defined in Vaswani et al. 2017.Then we define a proportion parameter γ to select the top γ • n words representing the style words a s src from x based on the importance score calculated in Eq. 2, where n represents the number of words in x.After that, we select the potential head-layer pair that can better fit the style word features, and the score z(a h,l ) can be calculated in Eq. 3: where s is the style label with the maximum probability assigned by the softmax distribution over S, and s ′ = S − {s}; λ represents a smoothing parameter.The final head-layer pair < h s , l s > can be selected as Eq.4: where H and L represents the whole head set and layer set separately; D is the validation set.

Style Rewriting
Style rewriting aims to transfer the AMR graph with the source style (G s src ) to the AMR with the target style (G s tgt ), which lays the intermediate representation for the decoder to generate sentences in the target style.Shou et al. (Shou et al., 2022) proposed AMR-DA, a novel AMR-based method that shows remarkable performance in the NLP data argumentation task.Inspired by the synonym replacement operation in their research, we transfer the style of G s src by modifying its nodes of style words with antonyms.WordNet (Miller, 1998) is currently the mainstream antonym recognition tool in the English vocabulary database.However, Word-Net can only identify antonyms corresponding to a limited number of words.In order to improve the coverage of style words, we propose a style rewriting algorithm based on the idea of query rewriting in information retrieval, as shown in Algorithm 1: Algorithm 1: Style Rewriting Algorithm def Style_Rewriting(a s src , c s src ): For the style word a s src i ∈ a s src , if it is consistent with the node in G s src , we transfer the style of the AMR graph through the style rewriting algorithm.In this algorithm, we introduce the Fasttext (Bojanowski et al., 2017) as a style words expander and the previous pre-trained RoBERTa (Ott et al., 2019) as a style gating unit, focusing on the stylistic features of words and sentences separately.The Fasttext is a word vector-based model that solves the out-of-vocabulary (OOV) problem by mining character-level n-gram features.We train the Fasttext model using each dataset's training set and extend the a s src by calculating the corresponding top ten synonyms through the pre-trained Fasttext model.However, sometimes the a s tgt generated based on the expanded a s src will have a "style backtracking" problem, that is, the a s tgt and a s src represent the same style feature.Therefore we introduce RoBERTa to filter expanded style words with the same style features as a s src .If the style feature of a s tgt is opposite to that of a s src , it will pass through the gate; otherwise, it will be blocked.To adapt to the features of RoBERTa, we embed each a s tgt i ∈ a s tgt to a sentence x tmp , which is "c s src + a s tgt i ", where c s src represents the non-stylistic content, the words after removing a s src from x s src .If x tmp can pass through the gating unit, the algorithm returns the corresponding a s tgt i ; otherwise, it executes recursively.This algorithm transfers the style of the AMR graph by rewriting the stylistic nodes with a s src to a s tgt that is maximally opposed to the source style.The style rewriting algorithm constrains the non-stylistic nodes to be consistent in order to maintain factual consistency.Moreover, this algorithm overcomes the dependence on parallel corpora, enabling the model to use the ubiquitous style opposites in natural language introduced by general or fine-tuned language models to rewrite the stylistic nodes in the AMR graphs.The computational complexity of the proposed style rewriting algorithm is reported in Appendix A.

Transferred AMR to Text
This component aims to generate the text in the target style from the modified AMR graph.The pre-trained transformer-based models have become the mainstream for this task (Mager et al., 2020;Ribeiro et al., 2021).These transfer learning-based models can adapt to generation tasks without the complex retraining processes.We use the SPRING (Bevilacqua et al., 2021) as the generator, which is the inverse task of AMR parsing.Compared with other generative methods in TST, the AMR-based method allows the model to generate diverse but semantically reasonable texts following the same semantic logic structure in simple ways (Figure 2).More importantly, for text style transfer tasks like sentiment transfer based on the review data with colloquial and non-normalized features, the AMR parsing and generation process can automatically correct the semantic normality of the source text, making the generated content more understandable.We discuss this advantage in Appendix B.

Datasets
We conduct the experiments on two datasets that provide human gold standard references, Yelp and Amazon, commonly used in text style transfer tasks.The Yelp dataset includes users' positive and negative reviews of specific businesses, while the Amazon dataset contains reviews with sentiment polarity of the products sold on Amazon.We use the same train-dev-test split as Li et al. 2018, and the statistics of the datasets are shown in Table 1.

Evaluation Metrics
The widely-agreed goals of the text style transfer tasks are that the transferred text conforms to the target style intensity, preserves the content consistent with the non-stylistic part of the source text, and with natural human writing characteristics (Mir et al., 2019).In accordance with these goals, we applied automatic evaluation metrics commonly used in text style transfer tasks for evaluating the methods from the following aspects: • Style Transfer Intensity (Sty.):We train the FastText3 (Joulin et al., 2017) as a style classifier on the training set following the train-dev-test split shown in Table 1.In addition, we use the previously fine-tuned RoBERTa model as an additional style evaluation classifier, which is more sensitive to style features based on the detected words.We use these classifiers to measure the accuracy (AC f and AC b ) with which the style of generated texts are successfully transferred to the target style.
• Content Preservation (Cont.):We use BLEU (Papineni et al., 2002) score to measure the overlap between the transferred text and the source text or the human-written reference, represented as BLEU s and BLEU r .Narasimhan et al. 2022 mentioned that the BLEU scores alone are insufficient to measure relevance to the target content.Following their conclusions, we borrow the metric widely used in machine translation and text summarization tasks, ROUGE-L (Lin, 2004), for content preservation evaluation.This metric shows more correlations with human judgment, and RL s and RL r represent the ROUGE-L calculated with source and reference text separately.
• Naturalness (Nat.):We fine-tune the OpenAI GPT-2 (Radford et al., 2019), a large pre-trained language model, using the training set following the same train-dev-test split in Table 1.We calculate the perplexity (PPL) of the transferred texts by this fine-tuned language model for evaluating the model in generating natural and fluent text.
• Geometric Mean (GM): Following Yi et al. 2020, we report the geometric mean of AC f , AC b , BLEU s , BLEU r , RL s , RL r , and 1 ln P P L as an overall evaluation metric.

Baseline Methods
We compare the proposed AMR-TST with novel state-of-the-art TST methods based on various mainstream techniques: B-GST (Sudhakar et al., 2019): explicitly separates content and style features to generate transferred text by inputting nonstylistic content and target style; G-GST (Sudhakar et al., 2019): retrieves style words from the target corpus and generates the transferred text based on the retrieved target style words and nonstylistic content; DAAE (Shen et al., 2020): augments adversarial auto-encoders with denoising objectives to enable zero-shot text style transfer; VT-STOWER (Xu et al., 2021b): based on the VAE structure with pivot words enhancement learning that learns decisive words for a specific style; EPAAE (Narasimhan et al., 2022): controls the strength of style transfer by clustering stylistically similar sentences based on latent space produced by a finely adjustable noise component; RLPrompt (Deng et al., 2022): a discrete prompt optimization method with reinforcement learning that generates the desired discrete prompts formulated by the parameter-efficient policy network.

Automatic Evaluation
The automatic evaluation results of the proposed AMR-TST and other baselines are shown in Table 2.It can be observed that most baselines have difficulty balancing the strength between style transfer and content preservation, because rewriting style words necessarily affects the word overlap between texts, thus creating the contradiction between these two goals.Compared to other baselines, DAAE (Shen et al., 2020) has achieved better results in balancing these two goals.VT-STOWER (Xu et al., 2021b) shows good results in target style accuracy because it is good at learning the keywords that determine the target style.RLPrompt (Deng et al., 2022) performs better in perplexity since the introduction of prompt constrains the randomness of the generation process, thereby enhancing the readability of the generated text.
In compression, the AMR-TST achieves stateof-the-art results in the GM metric that comprehensively evaluates the model from the three aforementioned goals.These results also prove that the text generated by the AMR-TST model can maximize the transfer of the source text to the target style while retaining the core content and constraining the semantic logic to best conforms to the natural language specification.In particular, the AMR-TST model has a significant advantage in the perplexity metric, which is closest to the per-plexity calculated from the source text (SRC).This result proves that AMR-TST can constrain the text generation process by relying on the global logical information represented by the graph structure even after the local information represented by the nodes has been modified.This advantage allows the AMR-TST to adapt the transferred local information to the remaining non-stylistic nodes with content information, thus making the generated text comprehensible by maintaining the semantic and factual features of the source text.

Human Evaluation
We invited ten volunteer annotators with extensive experience in English natural language understanding for human evaluation to evaluate AMR-TST and DAAE (Shen et al., 2020), which shows competitive results in the automated evaluation and the followed qualitative evaluation.Each annotator was asked to anonymously rate the ten randomly selected texts generated by these models from perspectives including style transfer intensity, content preservation, and naturalness.For each item, the annotators need to choose which of the generated texts is better, or neither one can decide.
Table 3 shows human evaluation results, which are the percentage representing which model generates the texts preferred by the annotators.It is evident that the AMR-TST-generated texts attract more preference, proving the AMR-TST transferred text is more in line with the language features that better fit human understanding habits.

Qualitative Evaluation
The quantitative evaluation results of the transferred text generated by the AMR-TST and baselines are shown in Table 4.The words with style features are marked as different colours, which are the target words of the style detector in our method.The AMR-TST can successfully generate the transferred text that conforms to the target style based on the rewritten graph nodes with the target style and the graph structure representing the semantic logic of the source text.In comparison, due to the lack of semantic control, some baselines are confused in keeping the natural semantic and factual consistency while hitting the target words in the transferred text.More importantly, the transferred texts generated by AMR-TST are most in line with human expression style and semantic norms since they are constrained by the semantic structure.Although we found the transferred target words some-  times slightly blunt when observing other generated samples, the text transferred by the AMR-TST did not have obvious grammatical errors compared with other baselines, which is reliable for applying the TST models in real-world scenarios.Moreover, the text generated by AMR-TST does not require complex post-processing, it can directly process abbreviations or punctuation marks for easy reading.

Ablation Study
In this section, we conduct the ablation study to verify the positive impact of the proposed style rewriting algorithm.Specifically, we evaluate the performance of models without the style words expander ("w/o se") and style gating unit ("w/o sg") components, as well as the model that has neither of these components ("w/o sr").The results of this study are shown in Table 5.
The results show that the proposed AMR-TST achieves the best results in GM compared to the models without the style rewriting components, demonstrating the positive impact of the style rewriting components on the model performance.Specifically, for the AC f and AC b metrics, there is a gradual decrease from w/o se, w/o sg, to w/o sr.This phenomenon demonstrates that the style rewriting algorithm and its style words expander and style gating unit components actively promote the style transfer from the source text to the target text.In contrast, among the metrics BLEU s , BLEU r , RL s , and RL r that reflect content preservation, the w/o sr model has better results, proving the style rewriting algorithm can promote the model to rewrite the source text to the greatest extent according to the target style, resulting in reduced content consistency between the transferred text and the source text.In addition, the results of w/o sr model on the BLEU s and RL s metrics also prove that AMR can better reconstruct source text while standardizing semantic representation.AMR-TST achieves competitive results in the PPL metric, which proves that the target style words generated by the style rewriting algorithm are in line with the natural expression habits of humans.

Conclusion
This paper proposes the AMR-TST, an abstract meaning representation-based text style transfer i ve come to the conclusion that i ve ever wasted my money .RLPrompt it's not a new innovation.i ve got a chance to make a living on a small scale and i am now a millionaire.

AMR-TST
There is nothing good to be said at all for this unattractive innovation.I've concluded that I'm conserving my money.

Limitations
The current AMR-TST is based on the style rewriting algorithm to rewrite the stylistic nodes of AMR graphs from source style to target style.However, this method relies on style opposites features contained in the general natural language corpus.The advantage of such a method is that it does not need complex decoder retraining processes for different datasets, which maximizes the use of generic natural language knowledge and reduces training costs.However, this also leads to a limitation that the current AMR-TST is applicable to text style transfer tasks with significant style polarity, such as sentiment features.For other text style transfer tasks like political and gender transfer, our current style rewriting algorithm cannot precisely rewrite the implicit style words in these tasks.To address this limitation, our future work will improve the style rewriting algorithm by finely identifying implicit style words and exploring their correlations, enabling the revised algorithm can be embedded in the current AMR-TST framework that focuses on the node-level stylistic features.

Ethical Statement
This paper honors the ethical code set out in the ACL Code of Ethics.

A Computational Complexity
The proposed style rewriting algorithm utilizes a recursive structure, and on average, the number of recursions observed in the experiments was 2.78 times.The algorithm was implemented on a server equipped with an Intel Xeon E-2288G CPU and NVIDIA RTX 6000, and the average time required to transfer one source text was 0.92 seconds.

B Semantic Normality Correction
This appendix demonstrates the advantages of AMR-TST in checking and correcting the semantic normality of source text.The examples are in Table 6.It is shown that when the source text contains irregular symbol representations such as "...", the transferred text generated by AMR-TST can automatically remove these symbols (#1 and #2).Moreover, the AMR-TST can also understand and convert some symbols of the source text into words (#3: e.g."&" → "and").When the source texts contain some colloquial superlatives, AMR-TST can understand their semantics and paraphrase them in a normalized form (#4 and #5).All these advantages improve the transferred text's readability, making it more intuitive and easier to understand.and paid participants, and discuss if such payment is adequate given the participants' demographic (e.g., country of residence)?Section 5.2 D3.Did you discuss whether and how consent was obtained from people whose data you're using/curating?For example, if you collected data via crowdsourcing, did your instructions to crowdworkers explain how the data would be used?Not applicable.The annotators are only there for human evaluation, not to annotate the dataset.
D4. Was the data collection protocol approved (or determined exempt) by an ethics review board?Not applicable.The annotators are only there for human evaluation, not to annotate the dataset.
D5. Did you report the basic demographic and geographic characteristics of the annotator population that is the source of the data?Not applicable.The annotators are only there for human evaluation, not to annotate the dataset.
Figure 2, which consists of three components: (1) Text to AMR, (2) AMR Style Transfer, and (3) Transferred AMR to Text.The source text is first transduced to the AMR graph by the AMR parser; the AMR style transfer achieves graph modification by rewriting the nodes consisting of style words detected by the style detector; the diverse transferred texts with the target style are generated by the AMR decoder based on the modified graph.

Figure 2 :
Figure 2: Overview of the proposed AMR-TST pipeline

Table 2 :
Automatic evaluation results, where SRC and H represent the source text and human reference AC f AC b BLEU s BLEU r RL s RL r PPL

Table 3 :
Human evaluation results

Table 4 :
Examples of transferred text generated by the AMR-TST and baselines

Table 5 :
Automatic evaluation results of ablation study for verifying the effectiveness of the components in the style rewriting algorithm AC f AC b BLEU s BLEU r RL s RL r PPL Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 3801-3807.International Joint Conferences on Artificial Intelligence Organization.Main track.

Table 6 :
Examples of the AMR-TST for normalizing semantic representation Did you discuss the experimental setup, including hyperparameter search and best-found hyperparameter values?Not applicable.Left blank.C3.Did you report descriptive statistics about your results (e.g., error bars around results, summary statistics from sets of experiments), and is it transparent whether you are reporting the max, mean, etc. or just a single run?Not applicable.Left blank.C4.If you used existing packages (e.g., for preprocessing, for normalization, or for evaluation), did you report the implementation, model, and parameter settings used (e.g., NLTK, Spacy, ROUGE, etc.)?Section 4.2 D Did you use human annotators (e.g., crowdworkers) or research with human participants?D1.Did you report the full text of instructions given to participants, including e.g., screenshots, disclaimers of any risks to participants or annotators, etc.? Section 5.2 D2.Did you report information about how you recruited (e.g., crowdsourcing platform, students)