Hierarchical Curriculum Learning for AMR Parsing

Abstract Meaning Representation (AMR) parsing aims to translate sentences to semantic representation with a hierarchical structure, and is recently empowered by pretrained sequence-to-sequence models. However, there exists a gap between their flat training objective (i.e., equally treats all output tokens) and the hierarchical AMR structure, which limits the model generalization. To bridge this gap, we propose a Hierarchical Curriculum Learning (HCL) framework with Structure-level (SC) and Instance-level Curricula (IC). SC switches progressively from core to detail AMR semantic elements while IC transits from structure-simple to -complex AMR instances during training. Through these two warming-up processes, HCL reduces the difficulty of learning complex structures, thus the flat model can better adapt to the AMR hierarchy. Extensive experiments on AMR2.0, AMR3.0, structure-complex and out-of-distribution situations verify the effectiveness of HCL.

The powerful pretrained encoder-decoder models, e.g., BART (Lewis et al., 2020), have been successfully adapted to the AMR parsing and became the mainstream and state-of-the-art meth-* Equal Contribution.

Sentence: Nine of soldiers died .
Sentence: Nine of the twenty soldiers died .
ods (Bevilacqua et al., 2021). Through directly generating the linearized AMR graph (e.g., Figure 1(a)) from the sentence, these sequence-tosequence methods (Xu et al., 2020b;Bevilacqua et al., 2021) circumvent the complex data processing pipeline and can be easily optimized compared with transition-based or graph-based methods (Naseem et al., 2019;Lee et al., 2020;Lyu and Titov, 2018;Zhang et al., 2019a,b;Cai and Lam, 2020;Zhou et al., 2021b). However, there exists a gap between the flat sentence-to-AMR training objective 1 and AMR graphs, since sequence-tosequence models deviate from the essence of graph representation. Therefore, it is difficult for sequential generators to learn the inherent hierarchical structure of AMR (Zhou et al., 2021b).
Humans usually adapt to difficult tasks by dealing with examples gradually from easy to hard, i.e., Curriculum Learning (Bengio et al., 2009;Platanios et al., 2019;Su et al., 2021;Xu et al., 2020a). Inspired by human behavior, we propose a hierarchi-  Figure 2: The overview of our hierarchical curriculum learning framework with two curricula, Structure-level (SC) and Instance-level Curricula (IC). During training, SC follows the principle of learning core semantics first, which switches progressively from shallow to deep AMR sub-graphs. IC follows the human intuition to start with easy instances, which transits from easy to hard AMR instances. 43.6% Figure 3: The average SMATCH scores for AMR graphs with different depths. The AMR graphs with at least depth 7 accounted for 43.6% in the AMR-2.0 test set.
cal curriculum learning framework with two curricular strategies to help the flat pretrained model progressively adapt to the hierarchical AMR graph. (1) Structure-level Curriculum (SC). AMR graphs are organized in a hierarchy where the core semantic elements stay closely to the root node (Cai and Lam, 2019). As depicted in Figure 1, the concepts and relations that locate in the different layers of the AMR graph correspond to different levels of abstraction in terms of the semantic representation. Motivated by the human learning process, i.e., core concepts first, then details, SC enumerates all AMR sub-graphs with different depths, and deals with them in order from shallow to deep. (2) Instance-level Curriculum (IC). Our preliminary study in Figure 3 shows that the performance of the vanilla BART baseline would drop rapidly as the depth of AMR graph grows, which indicates that handing deeper AMR hierarchy is more difficult for pretrained models. Inspired by the human cognition, i.e., easy ones first, then hard ones, we propose IC which trains the model by starting from easy instances with a shallower AMR structure and then handling hard instances. To sum up: (1) Inspired by the human learning process, i.e., core concepts first and easy in-stances first, we propose a hierarchical curriculum learning (HCL) framework to help the sequence-tosequence model progressively adapt to the AMR hierarchy.

Methodology
We formulate AMR parsing as a sequence-tosequence transformation. Given a sentence x = (x 1 , ..., x N ), the model aims to generate a linearized AMR graph y = (y 1 , ..., y M ) with a product of conditional probability: p(y i |(y 1 , y 2 , ..., y i−1 )) As shown in Figure 1(a), following Bevilacqua et al. (2021), the AMR graph is linearized by the DFSbased linearization method with special tokens to indicate variables and parentheses to mark visit depth. Specifically, variables of AMR nodes are set to a series of special tokens <R0>, ..., <Rk> (more details of linearization are included in Appendix A). In this paper, we propose a hierarchical curriculum learning framework ( Figure 2) with the structureand instance-level curricula to help the flat model progressively adapt to the structured AMR graph.

Structure-level Curriculum
Motivated by learning core concepts first, we propose Structure-level Curriculum (SC). AMR graphs are organized in a hierarchy where the core semantics stay closely to the root (Cai and Lam, 2019), thus SC divides all AMR sub-graphs into N buckets according to their depths {S i : i = 1, 2, ..., N }, where S i contains AMR sub-graphs with the depth  i. As shown in Figure 2(a), SC has N training episodes, and each episode consists of T sc steps. In each step of the i-th episode, the training scheduler samples a batch of examples from buckets {S j : j ≤ i} to train the model. When parsing a sentence into a sub-graph with the depth d, we append a special string "parse to d layer" to the input sentence, and replace the start token of the decoder with an artificial token <d>, so the model can perceive layers that need to be parsed.

Instance-level Curriculum
Inspired by learning easy instances first, we propose Instance-level Curriculum (IC). Figure 3 shows AMR graphs with deeper layers can be regarded as harder instances for the flat pretrained model, thus IC divides all AMR graphs into M buckets according to their depths {I i : i = 1, ..., M }, where I i contains AMR graphs with the depth i. As shown in Figure 2(

Analysis
Structure Benefit In order to explore the effectiveness of our HCL framework for the structured AMR parsing. We divide the fine-grained F1 scores into 2 categories, "structure-dependent" (unlabelled, re-entrancy and SRL) and "structureindependen" (the left 5 metrics). Please refer to Appendix C for the reason for this division. As shown in Table 1, compared with Bevilacqua et al. (2021) (also a sequence-to-sequence model based on BART-large), our method achieves 2.97 and 2.83 average F1 scores improvement on 3 structure-dependent metrics on AMR2.0 and AMR3.0, respectively, which proves HCL helps the flat sequence-to-sequence model better adapt to AMR with the hierarchical and complex structure.
Hard Instances Benefit Figure 4 shows the performances of our HCL and Bevilacqua et al. (2021

Ablation Study
To illustrate the effect of our proposed curricula. We conduct ablation studies by removing one curriculum at a time. Table 2 shows the SMATCH scores on both AMR2.0 and AMR3.0. As shown in Table 2, we can see both curricula are conducive to the performance of the model, and they are complementary to each other. Specifically, the structure-level curriculum (SC) is more effective than the instance-level curriculum (IC). We think the reason is that SC constructs AMR subgraphs for training, which enhances the model's ability to perceive the AMR hierarchy.

Conclusion
In this paper, we propose a Hierarchical Curriculum Learning (HCL) framework for sequenceto-sequence AMR parsing, which consists of Structure-level Curriculum (SC) and Instance-level Curriculum (IC   AMR3.0 (LDC2020T02) is larger than AMR2.0 in size, which contains 55, 635, 1, 722 and 1, 898 sentence-AMR pairs for training development and testing set, respectively. AMR3.0 is a superset of AMR2.0.

B.2 Out-domain Distribution
BIO is a test set of the Bio-AMR corpus, consisting of 500 instances.
TLP is a AMR dataset annoated on the children's novel The Little Prince (version 3.0), consisting of 1, 562 instances.
New3 is a sub-set of AMR3.0, which is not included in the AMR2.0 training set, consisting of 527 instances.
We only regard Unlabeled, Reentrancy and SRL as "structure-dependent" metrics, since: (1) Unlabeled does not consider any edge labels, and only considers the graph structure. (2) Reentrancy is a typical structure feature for the AMR graph. Without reentrant edges, the AMR graph is reduced to a tree. (3) SRL denotes the core-semantic relation of the AMR, which determines the core structure of the AMR. (4) As described above, all other metrics have little relationship with the structure. For the input sentence, our method achieves the right AMR, while the baseline model (i.e., SPRING (Bevilacqua et al., 2021)) gets a shallower and wrong structure AMR. Figure 6 shows a case study (we omit some details of AMR graphs for a more clear description). As is illustrated, our method achieves the right AMR for the input sentence. However, the AMR parsed by the SPRING model (depth:5) is shallower than the gold AMR (depth:9), and their structures are also different (e.g., the root of the gold AMR and the SPRING parsed AMR are 'possible-01 ' and 'and', respectively). This case intuitively shows our HCL framework can help the model better handle the hard instance with complex structure.