From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding

Semantic parsing is challenging due to the structure gap and the semantic gap between utterances and logical forms. In this paper, we propose an unsupervised semantic parsing method - Synchronous Semantic Decoding (SSD), which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar-constrained decoding. Specifically, we reformulate semantic parsing as a constrained paraphrasing problem: given an utterance, our model synchronously generates its canonical utterancel and meaning representation. During synchronously decoding: the utterance paraphrasing is constrained by the structure of the logical form, therefore the canonical utterance can be paraphrased controlledly; the semantic decoding is guided by the semantics of the canonical utterance, therefore its logical form can be generated unsupervisedly. Experimental results show that SSD is a promising approach and can achieve state-of-the-art unsupervised semantic parsing performance on multiple datasets.


Introduction
Semantic parsing aims to translate natural language utterances to their formal meaning representations, such as lambda calculus (Zettlemoyer and Collins, 2005;Wong and Mooney, 2007), FunQL (Kate et al., 2005;Lu et al., 2008), and SQL queries. Currently, most neural semantic parsers (Dong and Lapata, 2016;Chen et al., 2018b;Shao et al., 2020) model semantic parsing as a sequence to sequence translation task via encoder-decoder framework. Figure 1: Different from previous staged methods (indicated by gray lines), our method generates canonical utterance and logical form synchronously. The semantic gap and the structure gap are simultaneously resolved by jointly leveraging paraphrasing and grammar-constrained decoding. Thus, our synchronous decoding employs both the semantic and the structure constraints to solve unsupervised semantic parsing.
Semantic parsing is a challenging task due to the structure gap and the semantic gap between natural language utterances and logical forms. For structure gap, because utterances are usually word sequences and logical forms are usually trees/graphs constrained by specific grammars, a semantic parser needs to learn the complex structure transformation rules between them. For semantic gap, because the flexibility of natural languages, the same meaning can be expressed using very different utterances, a semantic parser needs be able to map various expressions to their semantic form. To address the structure gap and the semantic gap, current semantic parsers usually rely on a large amount of labeled data, often resulting in data bottleneck problem.
Previous studies have found that the structure gap and the semantic gap can be alleviated by leveraging external resources, therefore the reliance on data can be reduced. For structure gap, previous studies found that constrained decoding can effectively constrain the output structure by injecting grammars of logical forms and facts in knowledge bases during inference. For example, the grammar-based neural semantic parsers (Xiao et al., 2016;Yin and Neubig, 2017) and the constrained decoding algorithm (Krishnamurthy et al., 2017). For semantic gap, previous studies have found that paraphrasing is an effective technique for resolving the diversity of natural expressions. Using paraphrasing, semantic parsers can handle the different expressions of the same meaning, therefore can reduce the requirement of labeled data. For example, supervised methods (Berant and Liang, 2014;Su and Yan, 2017) use the paraphrasing scores between canonical utterances and sentences to re-rank logical forms; Two-stage  rewrites utterances to canonical utterances which can be easily parsed. The main drawback of these studies is that they use constrained decoding and paraphrasing independently and separately, therefore they can only alleviate either semantic gap or structure gap.
In this paper, we propose an unsupervised semantic parsing method -Synchronous Semantic Decoding (SSD), which can simultaneously resolve the structure gap and the semantic gap by jointly leveraging paraphrasing and grammarconstrained decoding. Specifically, we model semantic parsing as a constrained paraphrasing task: given an utterance, we synchronously decode its canonical utterance and its logical form using a general paraphrase model, where the canonical utterance and the logical form share the same underlying structure. Based on the synchronous decoding, the canonical utterance generation can be constrained by the structure of logical form, and the logical form generation can be guided by the semantics of canonical form. By modeling the interdependency between canonical utterance and logical form, and exploiting them through synchronous decoding, our method can perform effective unsupervised semantic parsing using only pretrained general paraphrasing model -no annotated data for semantic parsing is needed.
We conduct experiments on GEO and OVERNIGHT.
Experimental results show that our method is promising, which can achieve competitive unsupervised semantic parsing performance, and can be further improved with external resources. The main contributions of this paper are: • We propose an unsupervised semantic parsing method -Synchronous Semantic De-coding , which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar-constrained semantic decoding.
• We design two effective synchronous semantic decoding algorithms -rule-level inference and word-level inference, which can generate paraphrases under the grammar constraints and synchronously decode meaning representations.
• Our model achieves competitive unsupervised semantic parsing performance on GEO and OVERNIGHT datasets.

Model Overview
We now present overview of our synchronous semantic decoding algorithm, which can jointly leverage paraphrasing and grammar-constrained decoding for unsupervised semantic parsing. Given an utterance, SSD reformulates semantic parsing as a constrained paraphrasing problem, and synchronously generates its canonical utterance and logical form. For example in Fig. 2, given "How many rivers run through Texas", SSD generates "What is the number of river traverse State0" as its canonical form and Answer(Count(River(Traverse 2( State0)))) as its logical form. During synchronous decoding: the utterance paraphrase generation is constrained by the grammar of logical forms, therefore the canonical utterance can be generated controlledly; the logical form is generated synchronously with the canonical utterance via synchronous grammar. Logical form generation is controlled by the semantic constraints from paraphrasing and structure constraints from grammars and database schemas. Therefore the logical form can be generated unsupervisedly.
To this end, SSD needs to address two challenges. Firstly, we need to design paraphrasingbased decoding algorithms which can effectively impose grammar constraints on inference. Secondly, current paraphrasing models are trained on natural language sentences, which are different from the unnatural canonical utterances. Therefore SSD needs to resolve this style bias for effective canonical utterance generation.
Specifically, we first propose two inference algorithms for constrained paraphrasing based syn-  Figure 2: Overview of our approach. The sentence is paraphrased to canonical utterance and parsed to logical form synchronously. When decoding "traverse", the paraphrase model tends to generate the words such as "run", "flow", "traverse" to preserve semantics. And synchronous grammar limits the next words of the canonical utterance to follow the candidate production rules. Then it is easy to discard "run" and "flow", and select the most likely word "traverse" with its production rule from the candidates. In this parper, we propose rule-level and word-level inference methods to decode words and production rules synchronously. chronous semantic decoding: rule-level inference and word-level inference. Then we resolve the style bias of paraphrase model via adaptive fine-tuning and utterance reranking, where adaptive fine-tuning can adjust the paraphrase model to generate canonical utterances, and utterance reranking resolves the style bias by focusing more on semantic coherence. In Sections 3-5, we provide the details of our implementation.

Synchronous Semantic Decoding
Given an utterance x, we turn semantic parsing into a constrained paraphrasing task. Concretely, we use synchronous context-free grammar as our synchronous grammar, which provides a one-to-one mapping from a logical form y to its canonical utterance c y . The parsing task y = arg max y∈Y p parse (y|x) is then transferred tô y = arg max y∈Y p paraphrase (c y |x). Instead of directly parsing utterance into its logical form, SSD generates its canonical utterance and obtains its logical form based on the one-to-one mapping relation. In following we first introduce the grammar constraints in decoding, and then present two inference algorithms for generating paraphrases under the grammar constraints.

Grammar Constraints in Decoding
Synchronous context-free grammar(SCFG) is employed as our synchronous grammar, which is widely used to convert a meaning representation into an unique canonical utterance (Wang et al., 2015;Jia and Liang, 2016). An SCFG consists of a set of production rules: N → α, β , where N is a non-terminal, and α and β are sequence of terminal and non-terminal symbols. Each non-terminal symbol in α is aligned to the same non-terminal symbol in β, and vice versa. Therefore, an SCFG defines a set of joint derivations of aligned pairs of utterances and logical forms. SCFGs can provide useful constraints for semantic decoding by restricting the decoding space and exploiting the semantic knowledge: Grammar Constraints The grammars ensure the generated utterances/logical forms are grammar-legal. In this way the search space can be greatly reduced. For example, when expanding the non-terminal $r in Fig 2 we don't need to consider the words "run" and "flow", because they are not in the candidate grammar rules.
Semantic Constraints Like the type checking in Wang et al. (2015), the constraints of knowledge base schema can be integrated to further refine the grammar. The semantic constraints ensure the generated utterances/logical forms will be semantically valid.

Rule-Level Inference
One strategy to generate paraphrase under the grammar constraint is taking the grammar rule as the decoding unit. Grammar-based decoders have been proposed to output sequences of grammar rules instead of words (Yin and Neubig, 2017). Like them, our rule-level inference method takes the grammar rule as the decoding unit. Figure 3 (a) shows an example of our rule level inference method. Answer(State(Loc_1(city0))) Answer(State(Loc_1(lake0))) Answer(State(Loc_1(largest(city)))) City0 lo c a te d in What that state is largest city city0 la k e 0 the located located Answer(State($s)) Answer(State(Loc_1(city0))) Answer(State(Loc_1(lake0))) Answer(State(Loc_1(largest(city)))) Figure 3: From the utterance "which state is city0 in", two inference methods generate its canonical utterance "what is state that city0 located in" and its logical form Answer(State(Loc 1(City0))). The ways they handle non-terminal $c which is not at the end of utterance-side production rule are represented by purple lines.

Algorithm 1: Rule-level inference
Input : input utterance x, paraphrasing model P ara, beam size B, maximum output length L, SCFG rules R, maximum search depth K; if all non-terminals in r β are on the right then Move utterances from beam c t+k to beam t+k , if non-terminals are on the right of the utterances. When the non-terminal in the utterance-side production rule is at the end of the rule (e.g., $e → state $s, State($s) ), denoting the utteranceside production rule as r β = [w 1 , w 2 , ..., w Lr , N ], we can simply expand non-terminals in canonical utterances by this rule, and generate the canonical utterances from left to right with probabilities computed by: Pparaphrase(wi|x, c y <t , w<i) Otherwise, we generate the next production rules to expand this rule (i.e., rule with purple line), until there is no non-terminal on the left of words, or the generating step reaches the depth of K. We use beam search during the inference. The inference details are described in Algorithm 1.

Word-Level Inference
Except for rule-level inference, we also propose a word-level inference algorithm, which generates paraphrases word by word under the SCFG constraints. Firstly, we construct a deterministic automaton using LR(1) parser (Knuth, 1965) from the CFG in utterance side. The automaton can transit from one state to another in response to an input. The inputs of the automaton are words and the states of it are utterance/logical form segments. LR(1) parser peeks ahead one lookahead input symbol, and the state transition table describes the acceptable inputs and the next states.
Then, in each decoding step we generate a word with a new state which is transited from previous state. An example is shown in Figure 3 (b). Only the acceptable words in the current state can be generated, and the end-of-sentence symbol can only be generated when reaching the final state. Beam search is also used in this inference.

Adaptive Fine-tuning
The above decoding algorithms only rely on a paraphrase generation model , which generates canonical utterance and logical form synchronously for semantic parsing. We can directly use general paraphrase generation models such as GPT-2 (Radford et al., 2019), T5 (Raffel et al., 2020) for SSD. However, as described in above, there exists a style bias between natural language sentences and canonical utterances, which hurts the performance of unsupervised semantic par-ing. In this section, we describe how to alleviate this bias via adaptive fine-tuning. Given a text generation model, after pretraining it using paraphrase corpus, we fine-tune it using synthesized sentence, canonical utterance pairs.
Previous studies have shown that the pretraining on synthesized data can significantly improve the performance of semantic parsing (Xu et al., 2020a;Marzoev et al., 2020;Xu et al., 2020b). Specifically, we design three data synthesis algorithms: 1) CUs We sample CUs from SCFGs, and preserve executable ones. As we do not have the paired sentences, we only fine-tune the language model of the PLMs on CUs. 2) Self Paras We use the trained paraphrase model to get the natural language paraphrases of the sampled canonical utterances to form sentence, canonical utterance pairs. 3) External Paras We also use external paraphrase methods such as back translation to get the pairs.

Utterance Reranking
Adaptive fine-tuning resolves the style bias problem by fitting a better paraphrase model. In this section, we propose an utterance reranking algorithm to further alleviate the style bias by reranking and selecting the best canonical form.
Given the utterance x and top-N parsing results (y n , c n ), n = 1, 2, ..., N , we rerank all candidates by focusing on semantic similarities between x and c n , so that canonical utterances can be effectively selected. Reranking for semantic parsing has been exploited in many previous studies (Berant and Liang, 2014;Yin and Neubig, 2019). These works employ reranking for canonical utterances selection. Differently, our re-ranker does not need labeled data. Formally, we measure two similarities between x and c n and the final reranking score is calculated by: score(x, c) = log p(c|x) + s rec (x, c) + s asso (x, c) (2) Reconstruction Score The reconstruction score measures the coherence and adequacy of the canonical utterances, using the probability of reproducing the original input sentence x from c with the trained paraphrasing model: s rec (x, c) = log p pr (x|c) Association Score The association score measures whether x and c contain words that are likely to be paraphrases. We calculate it as: in which, p (c i |x j ) means the paraphrase probability from x j to c i , and a(j|i) means the alignment probability. The paraphrase probability and alignment are trained and inferred as the translation model in SMT IBM model 2. OVERNIGHT This is a multi-domain dataset, which contains natural language paraphrases paired with lambda DCS logical forms across eight domains. We use the same train/test splits as Wang et al. (2015).
GEO(FunQL) This is a semantic parsing benchmark about U.S. geography (Zelle and Mooney, 1996) using the variable-free semantic representation FunQL (Kate et al., 2005). We extend the FunQL grammar to SCFG for this dataset. We follow the standard 600/280 train/test splits.
GEOGRANNO This is another version of GEO (Herzig and Berant, 2019), in which lambda DCS logical forms paired with canonical utterances are produced from SCFG. Instead of paraphrasing sentences, crowd workers are required to select the correct canonical utterance from candidate list. We follow the split (train/valid/test 487/59/278) in original paper.

Paraphrase Model
We obtain the paraphrase model by training T5 and GPT2.0 on WikiAnswer Paraphrase 3 , we train 10 epochs with learning rate as 1e-5. Follow , we sample 500K pairs of sentences in WikiAnswer corpus as training set and 6K as dev set. We generate adaptive fine-tuning datasets proportional to their labeled datasets, and back-translation(from English to Chinese then translate back) is used to obtain external paraphrases data. On average, we sample 423 CUs per domain, and synthesize 847 instances per domain in Self Paras and 1252 in External Paras.
Unsupervised settings In unsupervised settings, we do not use any annotated semantic parsing data. The paraphrase generation models are fixed after the paraphrasing pre-training and the adaptive fine-tuning. The models are employed to generate canonical utterances and MRs synchronously via rule-level or word-level inference. In rule-level inference, the leftmost nonterminators are eliminated by cyclically expanded and the maximum depth K is set to 5, the beam size is set to 20. SSD uses T5 as the pre-trained language model in all the proposed components, including adaptive fine-tuning, reranking and the two decoding constraints. Ablation experiments are conducted over all components with rule-level inference.
Unsupervised settings (with external nonparallel data)  have shown that external nonparallel data (including nonparallel natural language utterances and canonical utterances) can be used to build unsupervised semantic parsers. For fair comparison, we also conduct unsupervised experiments with external unparallel data. Specifically, we enhance the original SSD using the SAMPLES methods : we label each input sentences with the most possible outputs in the nonparallel corpus and use these samples as peusdo training data -we denote this setting as SSD-SAMPLES.
Supervised settings Our SSD method can be further enhanced using annotated training instances. Specifically, given the annotated utterance, logical form instances, we first transform logical form to its canonical form, then use them to further fine-tune our paraphrase models after unsupervised pre-training.
Baselines We compare our method with the following unsupervised baselines: 1) Cross-domain Zero Shot (Herzig and Berant, 2018), which trains on other source domains and then generalizes to target domains in OVERNIGHT and 2) GEN-OVERNIGHT (Wang et al., 2015) in which models are trained on synthesized CU, MR pairs; 3) We also implement SEQ2SEQ baseline on the synthesized data as SYNTH-SEQ2SEQ. 4) SYNTHPARA-SEQ2SEQ is trained on the synthesized data and CU paraphrase, MR pairs, the paraphrases are obtained in the same way in Section 4.

Overall Results
The overall results of different baselines and our method are shown in Table 1 and  Supervised DEPHT (Jie and Lu, 2018) -89.3 COPYNET (Herzig and Berant, 2019) 72.0 -One-stage  71.9 -Two-stage  71.6 -SEQ2SEQ (Guo et al., 2020) -87.1 SSD (Word-Level) 72.9 88.3 SSD (Grammar-Level) 72.0 87.9 Unsupervised (with nonparallel data) Two-stage  63.7 -WMDSAMPLES  35.  method, we report its performances on three settings. We can see that: 1. By synchronously decoding canonical utterances and meaning representations, SSD achieves competitive unsupervised semantic parsing performance. In all datasets, our method outperforms other baselines in the unsupervised settings. These results demonstrate that unsupervised semantic parsers can be effectively built by simultaneously exploit semantic and structural constraints, without the need of labeled data. 2. Our model can achieve competitive performance on different datasets with different settings. In supervised settings, our model can achieve competitive performance with SOTA. With nonparallel data, our model can outperform Two-stage. On GEO(FunQL) our model also ob-tains a significant improvement compared with baselines, which also verifies that our method is not limited to specific datasets (i.e., OVERNIGHT and GEOGRANNO, which are constructed with SCFG and paraphrasing.) 3. Both rule-level inference and word-level inference can effectively generate paraphrases under the grammar constraints. The rule-level inference can achieve better performance, we believe this is because rule-level inference is more compact than word-level inference, therefore the rule-level inference can search wider space and benefit beam search more.

Detailed Analysis
Effect of Decoding Constraints To analyze the effect of decoding constraints, we conduct ablation experiments with different constraint settings and the results are shown in Table 2: -SEMANTIC denotes removing the semantic constraint, -GRAMMAR denotes all constraints are removed at the same time, the decoding is unrestricted. We can see that the constrained decoding is critical for our paraphrasing-based semantic parsing, and both grammar constraints and semantic constraints contribute to the improvement.
Effect of Adaptive Fine-tuning To analyze the effect of adaptive fine-tuning, we show the results with different settings by ablating a finetuning corpus at a time (see Table 2). We can see that adaptive fine-tuning can significantly improve the performance. And the paraphrase generation model can be effectively fine-tuned only using CUs or Self Paras, which can be easily constructed. Effect of Reranking To analyze the effect of reranking, we compare the settings with/without reranking and its upper bound -Oracle, which can always select the correct logical form if it is within the beam. Experimental results show that reranking can improve the semantic parsing performance. Moreover, there is still a large margin between our method and Oracle, i.e., the unsupervised semantic parsing can be significantly promoted by designing better reranking algorithms.

Effect of Adding Labeled Data
To investigate the effect of adding labeled data, we test our method by varying the size of the labeled data on OVERNIGHT from 0% to 100%. In Fig. 4, we can see that our method can outperform baselines using the same labeled data. And a small amount of data can produce a good performance using our method.

Effect of Pretrained Language Models
To analyze the effect of PLMs, we show the results with different PLM settings: instead of T5 we use GPT-2 or randomly initialized transformers to construct paraphrasing models. Experimental results show that powerful PLMs can improve the performance. Powered by the language generation models to do semantic parsing, our method can benefit from the rapid development of PLMs.
Paraphrasing in Semantic Parsing. Paraphrase models have been widely used in semantic parsing. ParaSempre (Berant and Liang, 2014) use paraphrase model to rerank candidate logical forms. Wang et al. (2015) employ SCFG grammar rules to produce MR and canonical utterance pairs, and construct OVERNIGHT dataset by paraphrasing utterances. Dong et al. (2017) use paraphrasing to expand the expressions of query sentences. Compared with these methods, we combine paraphrasing with grammar-constrained decoding, therefore SSD can further reduce the requirement of labeled data and achieve unsupervised semantic parsing.

Conclusions
We propose an unsupervised semantic parsing method -Synchronous Semantic Decoding, which leverages paraphrasing and grammar-constrained decoding to simultaneously resolve the semantic gap and the structure gap. Specifically, we design two synchronous semantic decoding algorithms for paraphrasing under grammar constraints, and exploit adaptive fine-tuning and utterance reranking to alleviate the style bias in semantic parsing. Experimental results show that our approach can achieve competitive performance in unsupervised settings.