Making Better Use of Bilingual Information for Cross-Lingual AMR Parsing

Abstract Meaning Representation (AMR) is a rooted, labeled, acyclic graph representing the semantics of natural language. As previous works show, although AMR is designed for English at first, it can also represent semantics in other languages. However, they find that concepts in their predicted AMR graphs are less specific. We argue that the misprediction of concepts is due to the high relevance between English tokens and AMR concepts. In this work, we introduce bilingual input, namely the translated texts as well as non-English texts, in order to enable the model to predict more accurate concepts. Besides, we also introduce an auxiliary task, requiring the decoder to predict the English sequences at the same time. The auxiliary task can help the decoder understand what exactly the corresponding English tokens are. Our proposed cross-lingual AMR parser surpasses previous state-of-the-art parser by 10.6 points on Smatch F1 score. The ablation study also demonstrates the efficacy of our proposed modules.


Introduction
Abstract Meaning Representation (AMR) (Banarescu et al., 2013) is a rooted, labeled, acyclic graph representing sentence-level semantic of text. Nodes in the graph are concepts in the texts and edges in the graph are relations between concepts. Since AMR abstracts away from syntax and preserves only semantic information, it can be applied to many semantic related tasks such as summarization (Liu et al., 2015;Liao et al., 2018), paraphrase detection (Issa et al., 2018), machine translation (Song et al., 2019) and so on.
Previous works on AMR parsing mainly focus on English, since AMR is designed for English texts and parallel corpus of non-English texts and AMRs are scarce. Early work of AMR announces that AMR is biased towards English and is not an interlingua (Banarescu et al., 2013). Besides, some studies show that aligning AMR with non-English language is not always possible (Xue et al., 2014;. However, recent studies Blloshmi et al., 2020) show that AMR parsers are able to recover AMR structures when there are structural differences between languages, which demonstrate that it is capable to overcome many translation divergences. Therefore, it is possible for us to parse texts in target (non-English) languages into AMRs.
Another problem of cross-lingual AMR parsing is the scarcity of parallel corpus. Unlike machine translation or sentiment classification which have abundant resources on the Internet, we can only get non-English text and AMR pairs by human annotation.  align a non-English token with an AMR node if they can be mapped to the same English token to construct training set. They further train a transition-based parser using the synthetic training set. They also attempt to translate test set into English and apply an English AMR parser. Blloshmi et al. (2020) build training data in two ways. One of the approaches is that they use gold parallel sentences and generate synthetic AMR annotations with the help of an English AMR parser. Another approach is to use gold English-AMR pairs and get non-English texts by a pre-trained machine translation system. They further use a sequence-to-graph parser (Zhang et al., 2019a) to train a cross-lingual AMR parser.
According to (Blloshmi et al., 2020), a crosslingual AMR parser may predict the concepts less specific and accurate than the gold concepts. Therefore, we propose a new model introducing machine translation to enable our parser to predict more accurate concepts. In particular, we first build our training data similar to (Blloshmi et al., 2020), translating English texts into target languages. Our basic model is a sequence-to-sequence model, rather than the sequence-to-graph model used in (Blloshmi et al., 2020), since in English AMR parsing, sequence-to-sequence models can achieve state-of-the-art result with enough data for pre-training (Xu et al., 2020). While training, we introduce bilingual input by concatenating translated target language texts and English texts as inputs. As for inference stage, the bilingual input is the concatenation of translated English texts and target language texts. We hope that our model can predict more accurate concepts with the help of the English tokens, while it can still preserve the meaning of the original texts if there are semantic shifts in the translated English texts. Besides, during training process, we also introduce an auxiliary task, requiring the decoder to restore English input tokens, which also aims at enhancing the ability of our parser to predict concepts. Our parser outperforms previous state-of-the-art parser XL-AMR (Blloshmi et al., 2020) on LDC2020T07 dataset  by about 10.6 points of Smatch F1 score on average, which demonstrates the efficacy of our proposed cross-lingual AMR parser.
Our main contributions are summarized as follows: • We introduce bilingual inputs and an auxiliary task to a seq2seq cross-lingual AMR parser, aiming to enable the parser to make better use of bilingual information and predict more accurate concepts.
• Our parser surpasses the best previously reported results of Smatch F1 score on LDC2020T07 by a large margin. The results demonstrate the effectiveness of our parser. Ablation studies show the usefulness of the model modules. Codes are public available 1 .
• We further carry out experiments to investigate the influence of incorporating pretraining models into our cross-lingual AMR parser.

Related Work
Abstract Meaning Representation (AMR) (Banarescu et al., 2013) parsing is becoming popular recently. Some of previous works (Flanigan et al., 2014;Lyu and Titov, 2018;Zhang et al., 2019a) solve this problem with a two-stage approach. They first project words in sentences to AMR concepts, followed by relation identification. Transitionbased parsing is applied by (Wang et al., 2015b,a;Damonte et al., 2017;Guo and Lu, 2018;Naseem et al., 2019;Lee et al., 2020). They align words with AMR concepts and then take different actions based on different processed words to link edges or insert new nodes. Due to the recent development in sequence-to-sequence model, several works employ it to parse texts into AMRs (Konstas et al., 2017;van Noord and Bos, 2017;Ge et al., 2019;Xu et al., 2020). They linearize AMR graphs and leverage character-level or word-level sequence-to-sequence model. Sequence-to-graph model is proposed to enable the decoder to better model the graph structure. Zhang et al. (2019a) first use a sequence-to-sequence model to predict concepts and use a biaffine classifier to predict edges. Zhang et al. (2019b) propose a one-stage sequence-to-graph model, predicting concepts and relations at the same time. Cai and Lam (2020) regard AMR parsing as dual decisions on input sequences and constructing graphs. They therefore propose a sequence-to-graph method by first mapping an input words to a concept and then linking an edge based on the generated concepts. Recently, pre-training models have been proved to perform well in AMR parsing (Xu et al., 2020). Lee et al. (2020) employ a self-training method to enhance a transition-based parser, which achieves the state-ofthe-art Smatch F1 score in English AMR parsing. Vanderwende et al. (2015) first carry out research of cross-lingual AMR parsing. They parse texts in target language into logical forms as a pivot, which are then parsed into AMR graphs.  attempt to project non-English words to AMR concepts and use a transition-based parser to parse texts to AMR graphs. They also attempt to automatically translate non-English texts to English and exploit an English AMR parser. Blloshmi et al. (2020) try to generate synthetic training data by a machine translation system or an English AMR parser. They conduct experiments with a sequenceto-graph model in different settings, trying to find a best way to train with synthetic training data. Different from (Blloshmi et al., 2020), we treat crosslingual AMR parsing as a sequence-to-sequence transduction problem and improve seq2seq models with bilingual input and auxiliary task.  Figure 1: An example of cross-lingual AMR parsing. A Non-English text is first parsed into an AMR sequence and then the sequence is converted to an AMR graph.

Problem Setup
Cross-lingual AMR parsing is the task of parsing non-English texts into AMR graphs corresponding to their English translation. In this task, nodes in AMR graphs are still English words, PropBank framesets or AMR keywords, which are the same as the original design of AMR. Figure 1 shows an example of cross-lingual AMR parsing. We define X l as an input sample in language l and X l i is the i-th token of it. y is the corresponding AMR sequence derived from the AMR graph, and y i is the i-th token. The model should predict the AMR sequence y first and then transform the sequence into a graph. Figure 2 shows the training and inference processes of our proposed model. The basic model we adopt is Transformer (Vaswani et al., 2017) encoderdecoder model, since Xu et al. (2020) show that it can achieves state-of-the-art result in English AMR parsing. We introduce the bilingual input to our model. When training the model, the bilingual input contains original English text and translated text in non-English target language, which may not be very accurate. During inference, the bilingual input is composed of translated English text and original text in target language. With the help of bilingual input, our model can better understand and preserve the semantics of target language texts, and predict more accurate concepts according to the translated English texts. Apart from predicting AMR sequences, the model is also required to predict the English input texts as an auxiliary objective, which can further help the model learn the exact meaning of input tokens and predict their corresponding concepts more accurately.

Our Proposed Model
We will first introduce the way we obtain training data and the pre-processing and post-processing process in Section 4.1 and Section 4.2, followed by introducing the basic sequence-to-sequence model, the bilingual input and the auxiliary task. Blloshmi et al. (2020) propose two methods to generate parallel training data, namely parallel sentences -silver AMR graphs and gold AMR graphs -silver translations. The first approach means that we exploit human annotated parallel corpus of target languages and English and use an English parser to get the corresponding AMR graphs. The second approach means that we exploit human annotated English-AMR pairs and use a machine translation system to get texts in target languages. According to (Blloshmi et al., 2020), model training with data generated by gold English-AMR pairs performs better. We thus exploit this approach (i.e., gold AMR graphs -silver translations) to generate our data for training and validation.

Pre-Processing and Post-Processing
Following (van Noord and Bos, 2017), we first remove variables, since variables are only used to identify the same node in a graph and contain no semantic information, which may do harm to the model training process. We also remove wiki links (:wiki), since sequence-to-sequence model may link to non-existing objects of Wikipedia. As for co-referring nodes, we simply duplicate the concepts. It transforms an AMR graph into a tree. The final linearized AMR is the pre-order traversal of the tree.
In post-processing, we should restore a predicted AMR sequence without variables, wiki links and co-referring nodes to a AMR graph. Following   (2017) use DBpedia Spotlight to restore wiki links. However, same entity in different language is linked to different pages in DBpedia, which makes it difficult for cross-lingual AMR parser to restore the wiki linking the entity in English. Different from (van Noord and Bos, 2017), we restore a wiki link of a certain name if this name corresponds to the wiki link in training set.

Sequence-to-Sequence Model
After pre-processing, both input texts and output AMRs are sequences. Hence we are able to apply a sequence-to-sequence model to accomplish cross-lingual AMR parsing. We use Transformer (Vaswani et al., 2017), one of the most popular sequence-to-sequence model as our basic model. In order to be compatible with pre-training XLM-R (Conneau et al., 2020a) model, the tokenizer and input vocabulary we used is the same as XLM-R. Subword unit such as byte pair encoding (BPE) (Sennrich et al., 2016) is commonly used to reduce the size of vocabulary. Thus, we exploit BPE to get our output vocabulary.

Bilingual Input
AMR concepts heavily rely on the corresponding English texts. According to , a simple method that first translates the test set into English and then applies an English AMR parser can outperform their cross-lingual AMR parser. However, machine translation may introduce semantic shifts, which may do harm to the generation of AMRs.
We therefore introduce the bilingual input. Since we do not have gold parallel corpus, we use machine translation to get the bilingual input. During training, we concatenate the translated text in target language mentioned in Section 4.1 and the original English text as bilingual input. At the inference stage, we take the bilingual input by concatenating original target language texts and the translated English text. The model can better understand and preserve the semantic meanings of the input bilingual text. It can also predict more correct concepts, since the English tokens are also provided.

Auxiliary Task
AMR concepts are composed of English words and Propbank frames. According to (Blloshmi et al., 2020), roughly 60% of nodes in AMR 2.0 (LDC2017T10) are English words. What's more, Propbank predicates are similar to English words, such as predicate publish-01 and word publish. We argue that if the decoder can restore the input tokens in English precisely, it can predict the corresponding concepts appropriately.
We thus design an auxiliary task, requiring the decoder to predict the English input sequence. Inspired by multilingual machine translation (Johnson et al., 2017), we add a new BOS token indicating that the model should predict the English sequences instead of AMR sequences. The decoder predicting English sequences share the same weights as the decoder predicting AMR sequences.
The final loss function is the weighted sum of loss functions of these two tasks: Loss AM R -the loss of AMR sequence prediction and Loss Engthe loss of English sentence prediction. We adopt the cross-entropy loss for both tasks.

Implementation Details
The coefficient of Loss AM R is 1, while the coefficient of Loss Eng is 0.5. We use Adam optimizer (Kingma and Ba, 2015) to optimize the final loss function. The number of transformer layers in both encoder and decoder is 6. The embedding size and hidden size are both 512 and the size of feed-forward network is 2048. The head number of multi-head attention is 8. We follow (Vaswani et al., 2017) to tune the learning rate each step and the warmup step is 4000. The learning rate for decoder at each step is half of this learning rate. Following (Blloshmi et al., 2020), we use machine translation system OPUS-MT (Tiedemann and Thottingal, 2020) to get our bilingual input. We use all data from different languages to train our model and the final model is able to parse sentences in different languages.

Dataset
The released test set, LDC2020T07, contains four translations of test set of AMR 2.0, including German (DE), Italian (IT), Spanish (ES) and Chinese(ZH).
As is mentioned in Section 4.1, we translate the sentences in a gold English-AMR dataset to get training and development data with OPUS-MT. We use AMR 2.0 as our gold English-AMR dataset. We also translate test sets in German, Italian, Spanish and Chinese back to English as input texts. The statistics of AMR 2.0 are shown in Table 1.

Evaluation Metric
Smatch  is the evaluation metric of AMR parsing. In this evaluation metric, AMR graph is regarded as several triples. Smatch  counts the numbers of matched triples and outputs the score based on total numbers of triples of two AMR graphs. We use the Smatch scripts available online 2 . Following (Damonte et al., 2017), we also introduce many fine-grained evaluations in order to evaluate the quality of the predicted AMR graphs in different aspects. We omit the details of these fine-grained evaluations here, which can be found in (Damonte et al., 2017).

Main Results
We compare our model with previous works and baseline methods including: • AMREager. This is the model proposed by . They assume that if a word X l t in target language is aligned with the word X en u in English and the English word aligns with AMR concept y i , X l t can be aligned with y i . Based on this assumption, they project AMR annotations to target languages and further train a transition-based AMR parser (Damonte et al., 2017) as in English.
• XL-AMR. This is the model proposed by (Blloshmi et al., 2020). When conducting experiments of their best model, they first generate synthetic training and validation data by machine translation. They then train an AMR parser on target language with a sequenceto-graph parser. They experiment XL-AMR   with many different settings: language specific setting, bilingual setting and multilingual setting. Language specific setting means that they only use target language data to train the model. Bilingual setting represents training with target language data and English data. Multilingual setting represents training with data in all languages. They also experiment multilingual setting except Chinese data because they found training with Chinese data will lower the results.
• Translate-test. This method first translates target language texts into English and uses an English AMR parser to predict the final AMR graphs. For fair comparison, we choose the sequence-to-sequence model as the English AMR parser. The encoder, decoder and hyperparameters of these modules are the same as those in our model. We use only English texts as input and do not apply the auxiliary task in the training of English parser. Note that this baseline is not compared in (Blloshmi et al., 2020) and we show it is a very strong baseline.

Input Texts
better relations between concepts.

Ablation Study
In order to verify the effectiveness of the bilingual input and the auxiliary task in our model, we carry out several ablation experiments. Table 4 shows the Smatch score and fine-grained results of ablation study. Compared with the basic sequence-to-sequence model, the bilingual input can improve the Smatch F1 score by 4.4 points on average. The introduction of auxiliary task brings 5.5 points improvement of Smatch on average. Our full model makes use of both bilingual input and auxiliary task at the same time, improving Smatch scores by 10.2 points, which indicates that each module is very beneficial to the performance of our model.
Fine-grained results further demonstrate the ef-  Table 5: Smatch scores of models employing pretrained models. fectiveness of our modules. As Table 4 shows, F1 score for Concepts improves substantially, which demonstrates that our proposed modules can actually help the parser predict more accurate concepts. Besides, fine-grained evaluation Negation achieves the highest improvement, revealing that our proposed modules enable the parser to understand the semantics of the input texts better.

Effect of Pre-trained Models on
Cross-Lingual AMR Parsing Recently, pre-training models on cross-lingual tasks have been proposed. Pre-training models, such as mBert (Devlin et al., 2019), XLM (Conneau and Lample, 2019), XLM-R (Conneau et al., 2020b) and mBart (Liu et al., 2020) achieve stateof-the-art results on many tasks such as machine translation, cross-lingual natural language inference and so on. In our experiment, we exploit XLM-R (Conneau et al., 2020b) as the input embeddings of the model.
When training an English AMR parser, Xu et al. (2020) first pre-train the model on large scale synthetic data and fine-tune it on gold English-AMR data. Since cross-lingual AMR parsing shares the same output formats with AMR parsing, we can employ the decoder of (Xu et al., 2020) to initialize our decoder and further finetune the cross-lingual AMR parser.
Results are listed in Table 5. The performance of our parser with XLM-R embedding improves by 2.1 points on Smatch score, while our parser finetuning pre-trained AMR decoder achieves 1.1 points improvement. We further employ both XLM-R embedding and pre-trained AMR decoder and the average Smatch score is 67.5. The results show that pre-trained cross-lingual embeddings like XLM-R as well as the pre-trained decoder can help the parser predict better AMR graphs. : polarity -: ARG0 ( missile : mod ( nucleus ) : ARG1-of ( launch-01 : ARG1-of ( possible-01 ) ) ) : ARG1 ( electricity ) ) S2S EN:The nuclear missile silos themselves did not lose power.

ZH:
Input Texts Figure 5: An example of AMR predicted by S2S and S2S + Bilingual Input. We mark the missing concept in blue and mark the inaccurate concepts in red. Figure 3 shows several AMRs parsed by models with and without auxiliary task. The AMR predicted by model without auxiliary task misses many concepts, while our full model predicts them correctly. What's more, our full model can predict relations of concepts more accurately as well. For example, the full model adds ARG0 between hamper-01 and problem, retaining semantic information of the original sentence. The model trained without auxiliary task predicts the relation ARG2 instead, changing the meaning of the original sentence.

Analysis
Another example in Figure 5 shows the efficacy of bilingual inputs. The AMR parsed by basic sequence-to-sequence model does not contain correct semantics. This model predicts many erroneous concepts such as launch-01, possible-01. Besides, the semantics of original sentence did not lose power is changed into have no electricity. The AMR produced by sequence-to-sequence model with bilingual input is almost correct except missing of concept silo. This example reveals that our bilingual input enables the parser to predict more accurate concepts and preserve the semantics of the sentence.
We also show an attention heatmap of an example in test set in Figure 4. This attention pattern shows that our parser can predict AMR tokens based on English translation (e.g. recommend-01) and based on both English and Spanish tokens (e.g. good-02).

Conclusion
In this paper, we focus on cross-lingual AMR parsing. Previous works have deficiency in predicting correct AMR concepts. We thus introduce bilingual inputs as well as an auxiliary task to predict more accurate concepts and their relations in AMR graphs. Empirical results on data in German, Italian, Spanish and Chinese demonstrate the efficacy of our proposed method. We also conduct ablation study to further verify the significance of the bilingual inputs and auxiliary task. For future work, we will attempt to adapt other methods used in English