ATP: AMRize Then Parse! Enhancing AMR Parsing with PseudoAMRs

As Abstract Meaning Representation (AMR) implicitly involves compound semantic annotations, we hypothesize auxiliary tasks which are semantically or formally related can better enhance AMR parsing. We find that 1) Semantic role labeling (SRL) and dependency parsing (DP), would bring more performance gain than other tasks e.g. MT and summarization in the text-to-AMR transition even with much less data. 2) To make a better fit for AMR, data from auxiliary tasks should be properly"AMRized"to PseudoAMR before training. Knowledge from shallow level parsing tasks can be better transferred to AMR Parsing with structure transform. 3) Intermediate-task learning is a better paradigm to introduce auxiliary tasks to AMR parsing, compared to multitask learning. From an empirical perspective, we propose a principled method to involve auxiliary tasks to boost AMR parsing. Extensive experiments show that our method achieves new state-of-the-art performance on different benchmarks especially in topology-related scores.

Recently, AMR Parsing with the sequence-tosequence framework achieves most promising re-* Equal Contribution. † Corresponding Author.
The boy wants to leave .

want-01
The boy to leave leave-01 The boy sults (Xu et al., 2020;Bevilacqua et al., 2021). Comparing with transition-based or graph-based methods, sequence-to-sequence models do not require tedious data processing and is naturally compatible with auxiliary tasks (Xu et al., 2020) and powerful pretrained encoder-decoder models (Bevilacqua et al., 2021). Previous work (Xu et al., 2020;Wu et al., 2021) has shown that the performance of AMR parser can be effectively boosted through co-training with certain auxiliary tasks, e.g. Machine Translation or Dependency Parsing.
However, when introducing auxiliary tasks to enhance AMR parsing, we argue that three important issues still remain under-explored in the previous work. 1) How to choose auxiliary task? The task selection is important since loosely related tasks may even impede the AMR parsing according to Damonte and Monti (2021). However, in literature there are no principles or consensus on how to choose the proper auxiliary tasks for AMR parsing. Though previous work achieves noticeable performance gain through multi-task learning, they do not provide explainable insights on why certain task outperforms others or in which aspects the auxiliary tasks benefit the AMR parser. 2) How to bridge the gap between tasks ? The gaps between AMR parsing and auxiliary tasks are non-negligible. For example, Machine Translation generates text sequence while Dependency Parsing (DP) and Semantic Role Labeling (SRL) produces dependency trees and semantic role forests respectively as shown in Figure 1. Prior studies (Xu et al., 2020;Wu et al., 2021;Damonte and Monti, 2021) do not attach particular importance to the gap, which might lead the auxiliary tasks to outlier-task Cai et al., 2017) in the Multitask Learning, deteriorating the performance of AMR parsing. 3) How to introduce auxiliary tasks more effectively? After investigating different training paradigms to combine the auxiliary task training with the major objective (AMR parsing), we figure out that, although all baseline models (Xu et al., 2020;Wu et al., 2021;Damonte and Monti, 2021) choose to jointly train the auxiliary tasks and AMR parsing with Multitask Learning (MTL), Intermediate-task Learning (ITL) is a more effective way to introduce the auxiliary tasks for pretrained models. Our observation is also consistent with (Pruksachatkun et al., 2020;Poth et al., 2021), which improve other NLP tasks with enhanced pretrained models.
In response to the above three issues, we summarize a principled method to select, transform and train the auxiliary tasks ( Figure 2) to enhance AMR parsing from extensive experiments. 1) Auxiliary Task Selection. We choose auxiliary tasks by estimating their similarities with AMR from the semantics and formality perspectives. AMR is recognized as a deep semantic parsing task which encompasses multiple semantic annotations, e.g. semantic roles, name entities and co-references. As a direct semantic-level sub-task of AMR parsing, we select SRL as one auxiliary task. Traditionally, formal semantics views syntactic parsing a precursor to semantic parsing, leading to the mapping between syntactic and semantic relations. Hence we introduce dependency parsing, a syntactic parsing task as another auxiliary task. 2) AMRization. Despite being highly related, the output formats of SRL, DP and AMR are distinct from each other. To this end, we introduce transformation rules to "AMRize" SRL and DP to PseudoAMR, intimating the feature of AMR. Specifically, through Reentrancy Restoration we transform the structure of SRL to a graph and restore the reentrancy within arguments, which mimics AMR structure. Through Redundant Relation Removal we conduct transformation in dependency trees and remove relations that are far from semantic relations in AMR graph.
3) Training Paradigm Selection. We find that ITL makes a better fit for AMR parsing than MTL since it allows model progressively transit to the target task instead of learning all tasks simultaneously, which benefits knowledge transfer .
We summarize our contributions as follows: 1. Semantically or formally related tasks, e.g., SRL and DP, are better auxiliary tasks for AMR parsing compared with distantly related tasks, e.g. machine translation and machine reading comprehension.
2. We propose task-specific rules to AMRize the structured data to PseudoAMR. SRL and DP with properly transformed output format further improve AMR parsing.
3. ITL outperforms classic MTL methods when introducing auxiliary tasks to AMR Parsing. We show that ITL derives a steadier and better converging process during training.
Extensive experiments show that our method (PseudoAMR + ITL) achieves the new state-of-theart of single model on in-distribution (85.2 Smatch score on AMR 2.0, 83.9 on AMR 3.0), out-ofdistribution benchmarks. Specifically we observe that AMR parser gains larger improvement on the SRL(+3.3), Reentrancy(+3.1) and NER(+2.0) metrics * , due to higher resemblance with the selected auxiliary tasks.

2483
The boy wants to leave .  Figure 3: Illustration of AMRization methods and Graph Linearization. The source sentence is "The boy wants to leave."

Methodology
As shown in Figure 2, in this paper, we propose a principled method to select auxiliary tasks (Section 2.1), AMRize them into PseudoAMR (Section 2.2) and train PseudoAMR and AMR effectively (Section 2.3) to boost AMR parsing. We formulate both PseudoAMR and AMR parsing as the sequence-to-sequence generation problem. Given a sentence x = [x i ] 1≤i≤N , the model aims to generate a linearized PseudoAMR or AMR graph y = [y i ] 1≤i≤M (the right part of Figure 3) with a product of conditional probability: p(y i |(y 1 , y 2 , ..., y i−1 ), x)

Auxiliary Task Selection
When introducing auxiliary tasks for AMR parsing, the selected tasks should be formally or semantically related to AMR, thus the knowledge contained in them can be transferred to AMR parsing. Based on this principle of relevance, we choose semantic role labeling (SRL) and dependency parsing (DP) as our auxiliary tasks. We involve Translation and Summarization tasks for comparison.
Semantic Role Labeling SRL aims to recover the predicate-argument structure of a sentence, which can enhance AMR parsing, because: (1) Recovering the predicate-argument structure is also a sub-task of AMR parsing. As illustrated in Figure   * Computed on AMR 2.0 and 3.0 dataset.
3(a,b), both AMR and SRL locate the predicates ("want", "leave") of the sentence and conduct wordsense disambiguation. Then they both capture the multiple arguments of center predicate.
(2) SRL and AMR are known as shallow and deep semantic parsing, respectively. It is reasonable to think that the shallow level of semantic knowledge in SRL is useful for deep semantic parsing.
Dependency Parsing DP aims to parse a sentence into a tree structure, which represents the dependency relation among tokens. The knowledge of DP is useful for AMR parsing, since: (1) Linguistically, DP (syntax parsing task) can be the precursor task of AMR (semantic parsing).
(2) The dependency relation of DP is also related to semantic relation of AMR, e.g., as illustrated in Figure 1(c), "NSUBJ" in DP usually represents ":ARG0" in AMR. Actually, they both correspond to the agent-patient relations in the sentence. (3) DP is similar to AMR parsing from the perspective of edge prediction, because both of them need to capture the relation of nodes (tokens/concepts) in the sentence.

AMRization
Although SRL and DP are highly related to AMR parsing, there still exists gaps between them, e.g., SRL annotations may be disconnected, while AMR is always a connected graph. To bridge these gaps, we transform them into PseudoAMR, which we call AMRization.

Transform SRL to PseudoAMR
We summarize typical gaps between SRL and AMR as: (1) Connectivity. AMR is a connected directed graph while the structure of SRL is a forest.
(2) Span-Concept Gap. Nodes in AMR graph represent concepts (e.g., "boy") while that of SRL are token spans (e.g., "the boy", "that boy"). Actually all the mentioned token spans correspond to the same concept.
(3) Reentrancy. Reentrancy is an important feature of AMR as shown in Figure 3(a), the instance boy is referenced twice as ARG0. The feature can be applied to conduct coreference resolution. However, there is no reentrancy in SRL. To bridge such gaps, we propose Connectivity Formation, Argument Reduction and Reentrancy Restoration to transform SRL into PseudoAMR.

Connectivity Formation
To address the connectivity gap, we need to merge all SRL trees into a connective graph. Note that the merging doesn't guarantee correctness in semantic level. As shown in Figure 3(b-1), we first add a virtual root node, then generating a directed edge from the virtual root to each root of SRL trees, thus the SRL annotation becomes a connected graph.

Argument Reduction
To address the Span-Concept Gap, as shown in Figure 3(b-2), if the argument of current predicate is a span with more than one token, we will replace this span with its head token in its dependency structure. Thus token spans "the boy", "that boy" will be transformed to "boy", more similar to the corresponding concept. Similar method has been to applied by  to find the head of token spans of argument.
Reentrancy Restoration For the reentrancy gap, we design a heuristic algorithm based on DFS to restore reentrancy in SRL. As shown in Figure 3(b-3), the core idea of the restoration is that we create a variable when the algorithm first sees a node. If the DFS procedure meets node with the same name, the destination of current edge will be redirected to the variable we have created at first. Please refer to Appendix A for the pseudo code of the reentrancy restoration.

Dependency Guided Restoration
The previous restoration algorithm can not guarantee the merging of nodes agrees to the meaning of reentrancy in AMR since it merges concept according to their appearance order in the SRL structure. And it does not handle the merging of predicates. As shown Figure 4: Illustration of Dependency Guided Restoration. In step 2, leaf-nodes "The boy" are merged. In step 3, none-leaf node "leave-01" is merged with leaf-node "to leave" since "leave-01" appears in word span "to leave" and word "leave" depends on word "want".
in Figure 3(b-3), the node "leave" and "leave-01" should be merged, however we can't get this information directly from the SRL annotations. We therefore propose another restoration method based on the dependency structure of the corresponding sentence of the SRL as illustrated in Figure 4 This restoration algorithm takes the result of previous Connectivity Formation as input. It first merges the leaf-nodes corresponding to the same token. This step is accurate since leaf-nodes' merging will not bring divergence. The second step is to merge predicate nodes. For all sub-trees of the root node, it first check whether one predicate appear in others' argument span and whether the predicate directly depend on the span's predicate. If both two conditions are satisfied, the algorithm will merge the predicate and the span to one node. Last, it removes the root node and root-edges if the graph remains connected after removing.

Transform Dependency Structure to PseudoAMR
We summarize the gaps between Dependency Tree and AMR as: (1) Redundant Relation. Some relations in dependency parsing focus on syntax, e.g., ":PUNCT" and ":DET", which are far from semantic relations in AMR.
(2) Token-Concept Gap. The basic element of dependency structure is token while that of AMR is the concept, which captures deeper syntax-independent semantics. We use Redundant Relation Removal and Token Lemmatization to transform the dependency structure to PseudoAMR to handle the gaps.

Redundant Relation Removal
For the Redundant Relation Gap, we remove some relations which are far from the sentence's semantics most of the time, such as "PUNCT" and "DET". As illustrated in Figure 3(c-1), by removing some relations of the dependence, the parsing result become more compact compared with original DP tree, forcing the model to ignore some semantics-unrelated tokens during seq2seq training.
Token Lemmatization As shown in Figure 3(c-2), for Token-Concept Gap, we conduct lemmatization on the node of dependency tree based on the observation that the affixes of single word do not affect the concept it corresponds to. Together with the smart-initialization (Bevilacqua et al., 2021) by setting the concept token's embedding as the average of the subword constituents, the embedding vector of lemmatized token ('want') becomes closer to the vector concept ('want-01') in the embedding matrix, therefore requiring the model to capture deeper semantic when conducting DP task.

Linearization
After all AMRization steps, the graph structure of SRL/DP also should be linearized before doing seq2seq training. As depicted in the right part of Figure 3, we linearize the graph by the DFS-based travel, and use special tokens <R0>, ..., <Rk> to indicate variables, and parentheses to mark the depth, which is the best AMR linearization method of Bevilacqua et al. (2021).

Training Paradigm Selection
After task selection and AMRization, we still need to choose an appropriate training paradigm to train PseudoAMR and AMR effectively. We explore three training paradigms as follows: Multitask training Following Xu et al. (2020); Damonte and Monti (2021), we use classic schema in sequence-to-sequence multitask training by adding special task tag at the beginning of input sentence and training all tasks simultaneously. The validation of best model is conducted only on the AMR parsing sub-task.
Intermediate training Similar to Pruksachatkun et al. (2020), we first fine-tune the pretrained model on the intermediate task (PseudoAMR parsing), followed by fine-tuning on the target AMR parsing task under same training setting.

Multitask & Intermediate training
We apply a joint paradigm to further explore how different paradigms affect AMR parsing. We first conduct multitask training, followed by fine-tuning on AMR parsing. Under this circumstance, Multitask training plays the role as the intermediate task.

Datasets
AMR Datasets We conducted out experiment on two AMR benchmark datasets, AMR 2.0 and AMR 3.0. AMR2.0 contains 36521, 1368 and 1371 sentence-AMR pairs in training, validation and testing sets, respectively. AMR 3.0 has 55635, 1722 and 1898 sentence-AMR pairs for training validation and testing set, respectively. We also conducted experiments in out-of-distribution datasets (BIO,TLP,News3) and low-resources setting.
Auxiliary Task Datasets Apart from DP/SRL, we choose NLG tasks including summarization and translation to evaluate the contributions of auxiliary tasks. Description of datasets is listed Appendix C.

Evaluation Metrics
We use the Smatch scores  and further the break down scores (Damonte et al., 2017) to evaluate the performance.
To fully understand the aspects where auxiliary tasks improve AMR parsing, we divide the fine-grained scores to two categories: 1) Concept-Related including Concept, NER and Negation scores, which care more about concept centered prediction. 2) Topology-Related including Unlabeled, Reentrancy and SRL scores, which focus on edge and relation prediction. NoWSD and Wikification are listed as isolated scores because NoWSD is highly correlated with Smatch score and wikification relies on external entity linker system.

Experiment Setups
Model Setting We use current state-of-the-art Seq2Seq AMR Paring model SPRING (Bevilacqua et al., 2021) as our main baseline model and apply BART-Large (Lewis et al., 2020) as our pretrained model. Blink  is used to add wiki tags to the predicted AMR graphs. We do not apply re-category methods and other post-processing methods are the same with Bevilacqua et al. (2021) to restore AMR from token sequence. Please refer to Section E from appendix for more training details.  AMRization Setting For SRL, we explore four AMRization settings. 1) Trivial. Concept :multisentence and relation :snt are used to represent the virtual root and its edges to each of the SRL trees.
2) With Argument Reduction. We use dependency parser from Stanford CoreNLP Toolkit (Manning et al., 2014) to do the argument reduction. 3) With Reentrancy Restoration 4) All techniques. For DP, we apply four AMRization settings 1) Trivial. Extra relations in dependency tree are added to the vocabulary of BART 2) With Lemmatization. We use NLTK (Bird, 2006) to conduct token lemmatization 3) With Redundant Relation Removal. We remove PUNCT, DET, MARK and ROOT relations. 4) All techniques.

Main Results
We report the result (ITL + All AMRization Techniques) on benchmark AMR 2.0 and 3.0 in Table 1. On AMR 2.0, our models with DP or SRL as intermediate task gains consistent improvement over the SPRING model by a large margin (1.2 Smatch) and reach new state-of-the-art for single model (85.2 Smatch). Compared with SPRING with 200k extra data, our models achieve higher performance with much less extra data (40k v.s. 200k), suggesting the effectiveness of our intermediate tasks. We also compare our models with contemporary work (Lam et al., 2021;Zhou et al., 2021b). It turns out that our ensemble model beats its counterpart with less extra data, reaching a higher performance (85.3 Smatch). In fact, even without ensembling, our model still performs better than those ensembling models and the model using Dependency Guided Restoration method achieves higher performance than the trivial one, showing the effectiveness of our methods.
On AMR 3.0, Our models consistently outperform other models under both single model (83.9 Smatch) and ensembling setting (84.0 Smatch). Same as AMR 2.0, our single model reaches higher Smatch score than those ensembling models, revealing the effectiveness of our proposed methods.
Fine-grained Performance To better analyse how the AMR parser benefits from the intermediate training and how different intermediate tasks affect the overall performance. We report the finegrained score as shown in Table 1. We can tell that by incorporating intermediate tasks, considerable increases on most sub-metrics, especially on the Topology-related terms, are observed. On both AMR 2.0 and 3.0 our single model with SRL as intermediate task achieves the highest score in Unlabeled, Reentrancy and SRL metrics, suggesting that SRL intermediate task improves our parser's capability in Coreference and SRL. DP leads to consistent improvement in topologyrelated metrics, which also derives better result on NER sub-task (92.5 on AMR 2.0, 89.2 on AMR 3.0). We suppose that the ":nn" relation which signifies multi-word name entities in dependency parsing helps the AMR parser recognize multi-word

Exploration in Auxiliary Task Selection
We explore how different tasks affect AMR parsing apart from DP and SRL. We involve two classic conditional NLG tasks, Summarization and Translation for comparison as shown in Table 2. The comparison implies that SRL and DP are better auxiliary tasks for AMR Parsing even under the circumstance where their counterparts exploit far more data (40k v.s. 400k). In fact, the performance of MT drops while introducing more data, which contradicts with Xu et al. (2020) 's findings that more MT data can lead to better result when pretraining the raw Transformer model. However, this is not surprising under the background of Intermediate-task Learning where we already have a pretrained model with large-scale pretraining. Whether the intermediate tasks' form fits for the target task is far more important than the amount of data in the intermediate-task as also revealed by Poth et al. (2021). According to their observation, tasks with the most data (QQP 363k, MNLI 392k) perform far worse ( -97.4% relative performance degradation at most) on some target tasks compared with tasks having much smaller datasets (CommonsenseQA 9k, SciTail 23k) which on the contrary give a positive influence.
In conclusion, our findings suggest that the selection of intermediate task is important and should be closely related to AMR parsing in form, otherwise it would even lead to a performance drop for AMR parsing.  Figure 5: The distance distribution of sentences representation. SRL and DP consistently provide more similar sentence representation to AMR than Translation. The computation is illustrated in Figure 7 in appendix.

More Similar Sentence Representation
To examine how different auxiliary tasks affect AMR parsing, we collect the sentences' representation from different tasks' trained encoders 2 . We use the average hidden state of the encoder's output as the sentence representation. We compute the Cosine Similarity and L2 distance between auxiliary tasks' representation and AMR's representation for same sentence. The test split of AMR 2.0 is used for evaluation. Finally, We apply Gaussian distribution to fit the distribution of distances and draw the probability distribution function curves as shown in Figure 5. It turns out that under both distance metrics, SRL/DP consistently provide more similar sentence representation to AMR than Translation and SRL/DP are more similar to AMR parsing. It empirically justifies our hypothesis that semantically or formally related tasks can lead to a better initialization for AMR parsing. Table 3, we conduct ablation study on how different AMRization methods affect the performance AMR parsing. For both SRL and DP, jointly adopting our AMRization techniques can further improve the performance of AMR parsing significantly, comparing to trivial linearization. The imperfect reentrancy restoration method leads to a significant improvement in terms of both the Topology and Concept related scores. It reveals that transformation of structure to mimic the feature of AMR can better the knowledge transfer between shallow and deep semantics.

As shown in
As shown in Table 8, compared with jointly using the two techniques, it is worth noting that model   with solely Reentrancy Restoration reaches highest fine-grained scores in especially on Reentrancy and SRL scores. To explore the reason why it surpasses adopting both techniques, we analyse the number of restored reentrancy. The result shows that about 10k more reentrancies are added when Argument Reduction (AR) is previously executed. It's expected since AR replaces the token span to the root token. Compared with token span, single token is more likely to be recognized as the correference variable according to the Reentrancy Restoration (RR) algorithm, thus generating more reentrancy, which might include bias to the model. This explains why solely using RR can lead to better results on SRL and Reen.

ITL Outweighs MTL
We report the result of different fine-tuning paradigms in Table 4. It justifies our assumption that classic multitask learning with task tag as previously applied in Xu et al. (2020); Damonte and Monti (2021)   As shown in Figure 6, Intermediate-task training provides a faster and better converging process than MTL. We assume this is due to the huge gap between AMR parsing and auxiliary tasks which may harm the optimization process of MTL. The process of optimizing all auxiliary tasks simultaneously may introduce noise to AMR Parsing.
We also find that under the setting of ITL, sequentially training SRL and DP tasks did not bring further improvement to AMR parsing. We guess this is due to the catastrophic forgetting problem. Further regularization during training might help the model progressively learn from different auxiliary tasks and relieve catastrophic forgetting.

Exploration in Out-of-Distribution Generalization
Following Bevilacqua et al. (2021);Lam et al. (2021), we assess the performance of our models when trained on out-of-distribution (OOD) data. The models trained solely on AMR 2.0 training data are used to evaluate out-of-distribution performance on the BIO, the TLP and the News3 dataset.

Exploration in Low Resources Setting
Since the annotation of AMR is both time and labor consuming, it raises our interests if we can improve the learning ability of AMR Parser under low resources setting. We set three low resources benchmarks BOLT, LORELEI, DFA for AMR parsing based on the different sufficient degree of training examples. Detail of the datasets is described in Appendix D . Compared with the AMR2.0 dataset which has 36521 training samples, the number of training samples in BOLT, LORELEI, DFA are 2.9%, 12.2% and 17.7% of the number of AMR2.0. Table 6 reports the result. Our model surpasses the SPRING model by a real large margin (about 25 Smatch) in the BOLT dataset which is the most insufficient in data and gains a consistent improvement on all datasets, suggesting that our pretraining method is effective under low resources conditions.
There are two ways to incorporate other tasks to AMR Parsing. Goodman et al. (2016) builds AMR graph directly from dependency trees while (Ge et al., 2019) parse directly from linearized syntactic tree. Xu et al. (2020) introduces Machine Translation, Constituency Parsing as pretraining tasks for Seq2Seq AMR parsing and Wu et al. (2021) introduces Dependency Parsing for transition-based AMR parsing. However all of them do not take care of the semantic and formal gap between the auxiliary tasks and AMR parsing.

Multitask & Intermediate-task Learning
Multi-task Learning (MTL) (Caruana, 1997) aims to jointly train multiple related tasks to improve the performance of all tasks. Different from MTL, Intermediate-task Learning (ITL) is proposed to enhance pretrained models e.g. BERT by training on intermediate task before fine-tuning on the target task. Recent studies (Pruksachatkun et al., 2020;Poth et al., 2021)

Conclusion
In this paper, We find that semantically or formally related tasks, e.g. SRL and DP are better auxiliary tasks for AMR parsing and can further improve the performance by proper AMRization methods to bridge the gap between tasks. And Intermediatetask Learning is more effective in introducing auxiliary tasks compared with Multitask Learning. Extensive experiments and analyses show the effectiveness and priority of our proposed methods.

A Algorithms
Algorithm 1 Reentrancy Restoration for SRL Input: Treenode:T Output: Graph:G Description: T is root node of the original SRL after node ROOT is added to form tree structure. G is the output graph with possible reentrancy restored. Global Variables: Dict: V={}. Here Dict is the official data structure of Python's dictionary.  (2021) make use of 4 SPRING models from different random seeds and their proposed graph ensemble algorithm to do the ensembling. They also include another ensemble model named Graphene All which includes four checkpoints from models of different architectures, SPRING (Bevilacqua et al., 2021), APT (Zhou et al., 2021a), T5, and Cai&Lam (Cai and Lam, 2020). We do not report the score of Graphene All since it aggregates models with different inductive bias while our ensemble model only use models from one structure. It is out of the scope for fair comparison.  (Marcus et al., 1999) The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation. We only utilize the dependency structure annotations to form our intermediate dependency parsing task. There are 39,832 (~40k) sentences.

C.4 Semantic Role Labeling
ONTONOTES (Weischedel et al., 2017) The OntoNotes project is built on two resources, following the PENN TREEBANK (Marcus et al., 1999) for syntax and the PENN PROPBANK for predicateargument structure. We select 40k sentences with SRL annotations to form intermediate task.

D Low-resource Datasets Description
We set three Low-resource Learning benchmark for AMR parsing:

E Training Details
We tune the hyper-parameters on the SPRING baseline, and then adding the auxiliary data using just those hyper-parameters without any changing. We use RAdam (Liu et al., 2019) as our optimizer, and the learning rate is 3e −5 . Batch-size is set to 2048 tokens with 10 steps accumulation. The dropout rate is set to 0.3.

Parameter
Searching Space Learning rate 1e-5, 3e-5, 5e-5, 1e-4 Batch-size 256, 512, 1024, 2048, 4096 Grad. accu. 10 Dropout 0.1, 0.2, 0.3   Figure 7: Illustration of how to compute sentence representation distance of different tasks. The sentences used for evaluate are never seen in the training of AMR Parsing and other auxiliary tasks. Cosine Similarity is computed the same way. We collect all sentences' distance of one encoder to draw the Gaussian distribution curve.