Dependency Patterns of Complex Sentences and Semantic Disambiguation for Abstract Meaning Representation Parsing

Abstract Meaning Representation (AMR) is a sentence-level meaning representation based on predicate argument structure. One of the challenges we find in AMR parsing is to capture the structure of complex sentences which expresses the relation between predicates. Knowing the core part of the sentence structure in advance may be beneficial in such a task. In this paper, we present a list of dependency patterns for English complex sentence constructions designed for AMR parsing. With a dedicated pattern matcher, all occurrences of complex sentence constructions are retrieved from an input sentence. While some of the subordinators have semantic ambiguities, we deal with this problem through training classification models on data derived from AMR and Wikipedia corpus, establishing a new baseline for future works. The developed complex sentence patterns and the corresponding AMR descriptions will be made public.


Introduction
Abstract Meaning Representation (AMR) is a sentence-level meaning representation based on predicate argument structure (Banarescu et al., 2013). AMR Parsing is the task of transforming a sentence into an AMR graph with nodes and edges, each representing a concept or relation. While early studies (Flanigan et al., 2014;Wang et al., 2015;Artzi et al., 2015;Pust et al., 2015) used dependency parsers to integrate syntactic features to their models, recent deep neural network-based approaches (Konstas et al., 2017;Peng et al., 2017;Cai and Lam, 2020) tend to encode the input sentence as a sequence without considering its syntactic structure.
Generally speaking, syntactic and semantic structures share much in common. It is assumed 1Code and resource are available at https://github. com/yama-yuki/skeletal-amr. that dependency trees and semantic role labeling structures have a strong correlation in that nsubj and dobj can be used interchangeably for ARG0 and ARG1 role (Xia et al., 2019). Since AMR is annotated based on PropBank frames (Palmer et al., 2005), the same could be said for AMR structures.
This holds to be true for a simple sentence, which is basically a matrix clause, comprised of a predicate and its arguments. However, it is not always the case with complex sentence constructions, each of which consists of a matrix clause and one or more subordinate clause(s). Consider Figure 1 which shows both dependency and AMR representation of a complex sentence with a subordinator as long as. While variables S's and V's are interchangeable between the representations, predicative relations and subordinator itself are expressed quite differently. Compared to uniform structures of simple sentences, various types of complex sentence are used in human language. This characteristics makes it challenging for existing AMR parsers to capture its structure correctly.
Among AMR parsers which are aware of syntactic structures, CAMR (Wang et al., 2015) directly transforms the result of dependency parsing into an AMR graph with transition-based algorithm. As between two clauses. In this case, CAMR predicts the relation between believe-01 and seem-01 as ARG1, though it should be represented as a causal relation with cause-01 as shown in Figure 2(b). This is a crucial error as it incorrectly determines the very core structure of the output graph. We assume that the solution could be achieved either by retraining the parser on larger annotated data or providing the parser with the core structure of the input sentence in advance. The latter seems more reasonable in terms of the cost of annotation.
With the motivation to aid AMR parsing task, we present a method to retrieve all occurrences of complex sentence constructions from an input sentence using a dedicated pattern matcher. At the moment, there is no comprehensive resource that provides structural information about the relations between two clauses, particularly in AMR framework. Therefore, as our first step, we attempt to develop a pattern dictionary of English complex sentences together with the corresponding AMRs which represent the skeleton structure of the sentence (hereinafter referred to as skeletal AMRs ). Then, we provide a pattern matcher which captures clausal relations between a superordinate and subordinate clauses in a complex sentence.
Our pattern matching approach faces the problem of syntactic and semantic ambiguities. When a complex sentence has more than one subordinate clause, we need to determine which pair of clauses are related. Consider the following example where two subordinate clauses appear in a single sentence.
( While there has been studies regarding the syntactic scope of a subordinate clause such as Utsuro et al. (2000), this problem is beyond the scope of this paper. We rely on the output of the dependency parser, which we employ in our pattern matching system, to decide which pair of clauses are syntactically related. Meanwhile, when a subordinator itself is ambiguous between several senses, we need to select the correct type of coherence relation between the clauses. Sentences in (2) show usages of a subordinator since, which is semantically ambiguous between causal and temporal senses.
(2) a. Since there is responsibility, we are not afraid. (AMR: bolt12_6455_6561.15) b. Also since he turned 80, people had been paying more and more attention to Mao Zedong's birthday. (AMR: bolt12_10511_7302.5) In order to resolve the semantic ambiguities of clause-level coherence relation, inspired by the work of Shi and Demberg (2019), we take finetuning-based approach with data augmentation method. With the support of weakly supervised data derived from Wikipedia, we achieve the scores of 75.65% and 83.94% on macro and micro F1 respectively, establishing a new baseline for coherence relation classification of complex sentence constructions in AMR framework.
To sum up, the contributions of this paper are the followings: • We create a comprehensive resource of complex sentence constructions.
• We develop a pattern matching system which takes a sentence as an input and returns a corresponding skeletal AMR.
• We establish a new baseline for semantic disambiguation task of complex sentence constructions in AMR framework.

Related Works
While our focus is on clause-level relation of complex sentence constructions, not much study has been done specifically on this topic in AMR framework. Rather, the topic is dealt with in the field of discourse structures, where coherence relations between any text segments are the main focus.
In the studies of discourse parsing, various attempts have been made to capture coherence relations between pairs of sentences or clauses (Pitler et al., 2008;Rutherford et al., 2017;Qin et al., 2017;Bai and Zhao, 2018). These works basically rely on discourse frameworks such as Rhetorical Structure Theory (RST; Thompson and Mann 1987) or Penn Discourse Tree Bank (PDTB; Prasad et al. 2008).
Most recently, Shi and Demberg (2019) has presented a finetuning-based approach using the bidirectional encoder representation from transformers (BERT; Devlin et al. 2019). They designed their model to learn 11 classes to achieve the state-ofthe-art performance on implicit discourse relation classification task in PDTB framework.
Their work was motivated by the method taken by Devlin et al. (2019) to pretrain BERT, which is called "next sentence prediction task" (NSP). In the process of pretraining using NSP, the model is presented with pairs of sentences. The model predicts whether the second sentence is the actual subsequent sentence. NSP enables BERT to represent a pair of sentences by packing them together as a single sequence.
Some studies have focused on discourse structure in AMR framework. Donatelli et al. (2018) enhances AMR by annontating tense and aspect phenomena at discourse-level. The work by O'Gorman et al. (2018) targets relations of sentences and provides annotation of coreference in multi-sentence AMR corpus. Yet, neither the structure of the complex sentence constructions nor the coherence relations between subordinate and matrix clauses have been much of a concern in this framework.

Pattern Matching System
In this section, we will give a description of our pattern matching system. The entire workflow is Figure 3: Pattern matching system workflow for a sentence "Since airline flights were disrupted, the company chartered a plane to fly the executives back to the West Coast.". illustrated in Figure 3, where the system takes a sentence as an input and returns a skeletal AMR if it is a type of complex sentence construction.
The disambiguation module will be described in the later section. We will use the sentence given as a running example throughout this paper.

Dictionary of Dependency Patterns
Complex sentence constructions in English grammar can be distinguished by their consisting subordinator. When creating a pattern dictionary, we refer to comprehensive studies on grammars (Quirk et al., 1985;Yamaguchi, 2013), which provide a typology of subordinators. To cover various types of constructions, we include simple (e.g. if, because), complex (e.g. as if, so that) and correlate (e.g. no sooner ... than) subordinators classified in Quirk et al. (1985). In addition to that, we also include the type of constructions involving degree and quantity which are introduced to AMR in Bonial et al. (2018). Our pattern matching method depends not only on lexical processing but also on syntactic processing. For syntactic framework, we follow the annotation of dependency structure in Universal Dependencies v2 format (UD; Nivre et al. 2020), which makes our patterns a form of dependency trees. The nodes in trees are represented either by a lemma form of a lexical entry or an abstract one defined by POS tags, where every edge has a dependency relation label. Regular expression is employed to place any form of a specific element. We provide these dependency patterns with a corresponding skeletal AMR that describes the core structure of an input sentence. Depending on the type of construction, a set of variables are used to take alignments between patterns and skeletons. At the moment, our dictionary includes 70 distinct patterns2. Figure 4 illustrates an example of a paired entry of a dependency pattern and skeletal AMR for because-construction:  The dependency pattern in Figure 4(a) describes the simplest type of structure in our dictionary. The nodes that are represented as variables s1 and v1 capture a predicate and its argument in a matrix clause, whereas s2 and v2 capture a subordinate clause. Subordinators are basically described as a's, where their lemma forms are given as cues.
REGEX operators are used on both nodes and edges to flexibly match possible elements. For example, the dependency relation of {REGEX:ˆnsubj} between s's and v's enables us to handle subjects of 2We make it available for users to add new patterns to the dictionary for further expansion. both the active (nsubj) and passive (nsubj:pass) voice. In the case of TAG:{REGEX:ˆV}, REGEX is used to represent any POS tag that starts from "V", meaning that it captures any form of a verb. Figure 4(b) shows the corresponding skeletal AMR, which represents a core relation between the two predicates. In all patterns in the dictionary, V 1 and V 2 act as slots for the predicates of the matrix and subordinate clauses.
Some subordinators could indicate more than one coherence relations. For example, Figure 5 shows the case of our running example with since, where the subordinator presents either (a) causal or (b) temporal relation: In the pattern dictionary, rather than enumerating entries for every possible structure, we describe a primitive structure of skeletal AMR to dynamically generate the actual relation, which is in accordance with the Generative Lexicon approach (Pustejovsky, 1995). More specifically, we accomodate all possible relations separating them with a vertical bar on an edge of a skeletal AMR in the way described in Figure 6(b), which we call a "primitive skeleton": The disambiguation step will follow after the pattern matcher returns the primitive skeleton. For simplicity and consistency, we use shortcuts to represent the relations if available, such as using a relational role :cause instead of a predicative frame cause-01 that can be substitutionally used.
Aside the semantic ambiguity of relation mentioned above, structural ambiguity could be seen when several types of structure exists within the same relation. Compare Figure 5(b) and 7, where both skeletal AMRs represent temporal relation but show different structures: Since the primitive skeleton only describes the simplest form, recovery step needs be taken to generate the final skeletal AMR for ambiguous subordinators. Therefore, we define a possible interpretation (such as Figure 5(b) and 7) for primitive skeleton of each ambiguous subordinator, which will be referred to after the skeletal AMR is disambiguated. This idea share similarity with the notion of meaning postulates (Carnap, 1952;Dowty, 1979). By following the definition, the structural ambiguity of skeletal AMRs will be resolved.

Pattern Matching Method
Our pattern matching method builds on the dependency matching module introduced by Honnibal et al. (2020) that matches subtrees within a dependency tree. The matcher works in naive manner, searching from the top to the bottom of the pattern dictionary. It is originally capable of handling recursive nature of clauses. Namely, even when an input sentence has more than two clauses, the matcher searches all possible patterns for all clause pairs in a single run.
Meanwhile, it suffers from the following situations: "overlap" where multiple patterns accidentally match with a pair of clauses and "copula" where the matcher fails to capture copular constructions due to the limitation of expressiveness of patterns. To account for these cases, we extend the matching method by incorporating additional functions.
Overlap Duplicated matching may occur within a single pair of clauses since some patterns share their forms.
(3) As if he were still in his old job, Mr.Wright enjoys a $120,000 annual office expense allowance.
When the target sentence is (3), the dependency matcher returns the following overlapping patterns: This mostly occurs between simple and complex subordinators in Quirkian term (e.g. as and as far as, if and even if, etc.). To address this issue, we assign an ID for each entry in the pattern dictionary to refer to the type of subordinators. For example, as is assigned "#1.6.1", whereas as if is "#2.2.1". The first number in an ID represents the number of words consisting the subordinator. The second describes its sorted order while the third is organized by the structure of the pattern. When the matcher detects an overlap, it looks up the IDs to select the most desirable output. In the case of overlap between simple and complex subordinators, the matcher compares the first number in ID, selecting the pattern with more words. In this case, as if is regarded as the desirable pattern.
Copula Verbs that show a relationship between subjects and objects are referred to as copulas (e.g. "is" for "John is tall."). Since UD format refrains from using a copula as a head of its com-plement3, the matcher cannot find a copular clause with complex sentence patterns alone. To avoid redundancy of creating additional patterns substituting V's with copulas for each entry in dictionary, we make an extra pattern outside that describes a copula-complement structure: With the implementation of a converter, our system can detect copular clauses in complex sentences and convert it to a copular-headed structure when matched, in the way illustrated in the gray nodes in Figure 3.

Semantic Disambiguation Experiment
In this section, we describe details of our study on semantic disambiguation. We observe that the coherence relation between subordinate and superordinate clauses are commonly ambiguous among causal, conditional, concessional, and temporal senses. Considering that we are dealing with relations between clauses, which basically can be regarded as pairs of simple sentences, we can cast the problem of semantic ambiguity as a multi-class sentence-pair classification task. To be more specific, we assign the most typical class labels including cause, cond, conc, and time to make it a 4-class classification setting.
Throughout the experiment, we use the pretrained "bert-base-uncased" model for finetuning. BERT is pretrained under next sentence prediction as well as masked language modeling task. During pretraining, the model predicts the actual next sentence from a pair of candidate sentences. Thus, the model is expected to learn what the next sentence should look like. By finetuning BERT on a set of pairs of clauses, we can further expect the model to capture coherence relations between them.
Our input will be a separate pair of clauses with special tokens "[CLS]" and "[SEP]". While BERT takes the input in sequential manner, we would like the model to take advantage of dependency parsing, which we apply in the process of pattern matching. Therefore, we add artificial tags of < and > to indicate the head verb of each clause. The input format of our running example would look like: "[CLS] airline flights < were > disrupted [SEP] the company < chartered > a plane to fly the executives back to the West Coast [SEP]".

Experimental Setup
For creating the dataset, we use the lateset release of AMR corpus (LDC2020T02), which provides 59.2k pairs of sentences and AMR graphs. To extract complex sentence constructions from the corpus, we use the Stanza pipeline (Qi et al., 2020) for lexical and syntactic processing of the sentences and employ our pattern matcher with all patterns in the dictionary. In order to check whether the corresponding AMR graph describes the relation we want for each class, we look for alignments between sentence tokens and AMR graphs45. Finally, we split the sentence to obtain a pair of clauses and a subordinator.
While the data derived from AMR corpus can be regarded as "supervised", the amount is relatively small with the total of 1,933 pairs of subordinate and matrix clauses. As it consumes time and money to create more supervised data, we take weak supervision approach to augment training data. In other words, using specific subordinators that are known to be unambiguous in AMR corpus as linguistic cues, we seek to obtain complex sentences from larger corpus. We use raw data from Wikipedia for data augmentation to generate "weakly supervised" data. The list of subordinators used in this method is shown in Table 1. The distribution of each data, which we will refer to as AMR and WIKI data, is illustrated in  4We use the alignments provided in the corpus that are automatically generated.
5We only target the simple structure such as the one presented in Figure 7, which we assume represent typical relation for each class.
Cosidering the size of AMR data, we make 5 splits6 of the data to perform 5-fold cross validation. When training on WIKI data, we average the scores of 5 runs with different random seeds to ensure stability. In addition, we perform grid search for hyperparameter tuning over the options in Table  3. All models will be evaluated using both Macro and micro F1 scores (F M and F m , respectively). Variances will be given in parentheses.

Baseline Model
As our baseline, we finetuned BERT solely on AMR data. To make best use of the data, we compared BERT→AMR with BERT→AMRs (with subordinators as a feature). One of BERT's "unused tags" is placed to specify the position of a subordinator in an input sequence:  Table 4, it turned out that explicit information of subordinators was not effective in our experiment, as opposed to the general tendency in explicit vs implicit connective settings (Pitler et al., 2008). Therefore, we choose BERT→AMR as our baseline and seek for other ways to make use of subordinators.

Approaches
In the experiments, we take several approaches to examine the effect of augmentated data on classification performance. For direct comparison of the training data, BERT→WIKI is solely finetuned on WIKI data to see whether the model would benefit from the larger weakly supervised data. BERT→MIX is finetuned on a combined set of data which consists of AMR data and certain amount of additional WIKI data. The 6Train:Dev:Test=3:1:1 for each split. amount of WIKI data used ranges from 2 20k, where we add 2k sentences7 at a time . We take this approach with an expectation that WIKI data would complement the inbalanced distribution of AMR data to perform better than the baseline or BERT→WIKI. Finally, we evaluate the model marked as BERT→WIKI→AMR which is first finetuned on WIKI data, then further finetuned on AMR data. This is conducted under our hypothesis that "prefinetuning" on WIKI data would make BERT model fit to our task than the original model, which is just pretrained on next sentence prediction task.  The scores of all approaches are presented together in Table 5. The results show that training on weakly supervised data by itself does not improve the baseline, with BERT→WIKI harming the performance by 16.39% points on F M and 12.57% points on F m . This may be attributed to the gap of construction types between WIKI and AMR data. Meanwhile, we see improvements of 3.06% and 3.21% when we add 8k amount of WIKI data to AMR data. The transition of scores by the amount of added data is presented in Figure 10. Both F M and F m show an increase as we combine AMR and WIKI data until they reach the peak at 8k. Further addition only lead to drop the model's performance. This suggests that too much amount of WIKI data dilutes the presence of supervised AMR data. We could predict that training on full addition of WIKI data would deteriorate its performance near to that of BERT→WIKI. Among all our approaches, BERT→WIKI→AMR achieved the best results with 8.37% and 6.93% increase as expected. This proves the effectiveness of prefinetuning on weakly supervised data when you have supervised data of a small size.

Results and Analyses
Additionally, we checked whether our preprocessing step of adding artificial tags ("<" and ">") in section 4 helped the model's performance. With 7Balanced data with 0.5k for each class label. Figure 10: Score transitions of BERT→MIX by amount of WIKI data added. "k" stands for 1,000. untagged version of BERT→WIKI→AMR scoring 71.21% for F M and 80.47% for F m , we see 1.22% and 0.75% increase for tagged version. We find that the improvements achieved are not as much as utilizing weakly supervised data, but still beneficial to some extent considering it is a byproduct of dependency matching.
While it was not effective to use subordinators as features, it remains reasonable to take advantage of its information. Therefore, we make modifications to the trained models in Table 5. We first create a list of ambiguous subordinators and their possible labels (e.g. since is ambiguous between cause and time). When we feed a complex sentence, its subordinator will be searched in the list to check whether it is potentially ambiguous. If it turns out to be true, a restriction will be applied to the softmax layer by lowering the probability of irrelevant class labels to 0. In other words, the models are modified to only look at possible labels defined in the list.  The results after applying the restriction are shown in Table 6 with +r in model names. Compared to the performance in Table 5, we achieve overall improvements of 3.05 3.64% on F M and 2.72 3.02% on F m for all models. Table  7(a) and 7(b) present precision, recall, and F1 of BERT→AMR and BERT→AMR+r by labels.
The approach seems mostly effective except for conc where the precision decreases by 5.75%. With further analysis on confusion matrices on the first split of data for cross validation in Table 8, we find the number of false positive errors of conc increased. This is due to an error predicting sentences with while, which we regard ambiguous between conc and time. When the correct label is time, the model first predicted cause or cond. Even after the restriction was applied, the model predicted conc. Overall, the restriction seems to help the models reduce false positive errors.

Conclusion
With the intention to capture the structure of complex sentence constructions in AMR framework, we proposed a pattern matching method using a list of dependency patterns and its corresponding skeletal AMRs. In the course of creating a comprehensive pattern dictionary, we observed semantically ambiguous entries. In order to resolve the semantic ambiguities, we framed the problem as a sentence-pair classification task and finetuned pretrained BERT models on data derived from AMR and Wikipedia corpus. Through the experiments, we found that the supplemental usage of weakly supervised data generated from Wikipedia effectively improves performance of the models compared to the one trained solely on small-sized supervised data. A natural next step will be to incorporate the presented method in AMR parsing task, which we leave for future work.