Text Generation from Discourse Representation Structures

We propose neural models to generate text from formal meaning representations based on Discourse Representation Structures (DRSs). DRSs are document-level representations which encode rich semantic detail pertaining to rhetorical relations, presupposition, and co-reference within and across sentences. We formalize the task of neural DRS-to-text generation and provide modeling solutions for the problems of condition ordering and variable naming which render generation from DRSs non-trivial. Our generator relies on a novel sibling treeLSTM model which is able to accurately represent DRS structures and is more generally suited to trees with wide branches. We achieve competitive performance (59.48 BLEU) on the GMB benchmark against several strong baselines.

Although there has been considerable activity recently in developing models which analyze text in the style of DRT (van Noord et al., 2018Liu et al., 2019aLiu et al., , 2018Fancellu et al., 2019), attempts (a) b 5 b 6 : x 1 , b 1 : e 1 , b 1 : x 2 , b 2 : t 1 b 1 b 1 : Theme(e 1 , x 2 ) b 1 : temp after(e 1 , t 1 ) b 1 : Pred(x 2 , piano.n.01) b 1 : Pred(e 1 , play.v.03) b 2 : Pred(t 1 , now.n.01) b 1 : Agent(e 1 , x 1 ) b 6 : Pred(x 1 , male.n.02) b 4 b 4 : : b 3 : x 3 , b 3 : e 2 b 3 b 3 : temp before(e 2 , e 1 ) b 3 : Pred(e 2 , stop.v.05) b 3 : Patient(e 2 , x 1 ) b 1 : Pred(x 1 , male.n.02) b 3 : Named(x 3 , "tom") b 3 : Agent(e 2 , x 3 ) b 9 : x 2 , b 2 : e 9 , b 2 : x 3 , b 8 : t 4 b 2 b 9 : Pred(x 2 , male.n.02) b 2 : Pred(e 1 , play.v.03) b 2 : Agent(e 9 , x 2 ) b 2 : Theme(e 9 , x 3 ) b 2 : Pred(x 3 , piano.n.01) b 8 : Pred(t 1 , now.n.01) b 2 : temp after(e 9 , t 1 ) to generate text from DRSs have been few and far between (however see Basile 2015 and Narayan and Gardent 2014 for notable exceptions). This is primarily due to two properties of DRS-based semantic representations which render generation from them challenging. Firstly, DRS conditions are unordered representing a set (rather than a list). 2 A hypothetical generator would have to produce the same output text for any DRSs which convey the same meaning but appear different due to their conditions having a different order (see Figures 1 and 2a which are otherwise identical but the order of conditions in boxes b 1 and b 4 varies). The second challenge concerns variables and their prominent status in DRSs. Variables identify objects in discourse (such as entities and predicates), and are commonly used to model semantic phenomena including coreference, control constructions, and scope. In Figure 1, variables x, e, s, t, p, and b denote entities, events, states, time, propositions and boxes, respectively. Variable names themselves are arbitrary and meaningless posing a challenge for learning. Our generator must verbalize different variable names to the same surface form. The meaning representations in Figures 1 and 2b are identical and both correspond to the same discourse except that the variables have been given different names (b 5 in Figure 1 has been named b 1 in Figure 2b, b 1 is now b 2 , x 1 is x 2 , e 1 is e 9 , and so on).
These two problems are further compounded by the way DRSs are displayed, in a box-like format which is intuitive and easy to read but not convenient for modeling purposes. As a result, DRSs are often post-processed in a format that can be handled more easily by modern neural network models. For example, DRS variables and conditions are converted to clauses (van Noord et al., 2018) or DRSs are modified to trees where each box is a subtree and conditions within the box correspond to children of the subtree (Liu et al., 2019a(Liu et al., , 2018. In this paper we propose novel solutions to condition ordering and variable naming. We argue that even though DRS conditions appear unordered, they have a latent order due to biases in the way the training data is created. To give a concrete example, the Groningen Meaning Bank (GMB; Bos et al. 2017) provides the largest collection to date of English texts annotated with DRSs. These annotations were generated with the aid of a CCG parser (Clark and Curran, 2007); atomic DRS conditions were associated with CCG supertags and then semantically combined following the syntactic CCG derivations. Even annotators creating DRSs manually would be prone to follow a canonical order (e.g., listing named entities first, then verbal predicates and their thematic roles, and finally temporal conditions). We propose a graph-based model which learns to recover the latent order of conditions without explicitly enumerating all possible orders which can be prohibitive. We also handle variable names with a method which rewrites arbitrary indices to relative ones which are in turn determined by the order of conditions. Following previous work, we convert DRSs to a more amenable format. Specifically, we consider Discourse Representation Tree Structures (DRTSs; Liu et al. 2019b) as the semantic representation input to our document generation task, and generate a sequence of words autoregressively. We adopt an encoder-decoder framework with a treeLSTM (Tai et al., 2015) encoder and a standard LSTM (Hochreiter and Schmidhuber, 1997) decoder. Problematically, DRS trees are wide and the number of children for a given node can be as many as 180. It therefore becomes memory-consuming and sparse to assign a forget gate for each child as in the case of conventional (N -ary) treeLSTM (Tai et al., 2015). We propose a variant which we call Sibling treeLSTM that replaces N forget gates with a parent gate and a sibling gate. As a result, it reduces memory usage from O(N ) to O(2), and is more suitable for modeling wide and flat trees.
Our contributions can be summarized as follows: (1) we formalize the task of neural DRS-to-text generation; (2) we provide solutions for the problems of condition ordering and variable naming, which render generation from DRS-based meaning representations non-trivial; and (3) propose a novel sibling treeLSTM model that can be also generally used to model wide tree structures. We make our code and datasets publicly available. 3

Problem Formulation
Let S denote a DRS-based meaning representation. The aim of DRS-to-text generation is to produce text T that verbalizes input meaning S: where T is the set of all possible texts, S has an arbitrary order of conditions and indexing of variables, and Θ is the set of model parameters.
Our generation model is based on the encoderdecoder framework (Bahdanau et al., 2015) and operates over tree structures. Moreover, prior to training, variable names are rewritten so that their (arbitrary) indices denote relative order of appearance. We propose a novel sibling TreeLSTM for encoding tree structures. The decoder is a sequential LSTM equipped with an attention mechanism generating word sequence T = [t 0 , t 1 , ..., t m−1 ], where m is the length of the text. At test time, DRS conditions are normalized, i.e., they are reordered following a canonical order learned from data, and used as input to our generation model.
We first describe our DRS-to-tree conversion and variable renaming procedures (Sections 2.1 and 2.2). We next present our tree-to-sequence generation model (Section 2.3), and explain how DRS conditions are ordered (Section 2.4).

DRS-to-Tree Conversion
The algorithm of Liu et al. (2018) renders DRSs in a tree-style format. It constructs trees based on DRS conditions in the bottom box layers, without considering variables in the top layer. This results in oversimplified semantic representations and information loss (e.g., presuppositions cannot be handled). We improve upon their approach by 3 https://github.com/LeonCrashCode/ Discourse-Representation-Tree-Structure/ tree/main/gmb/DRS-to-text merging variables in the top layer with variables in the bottom layer via introducing special conditions. We collect variables in top layers of DRS boxes to construct a dictionary d = {v : b}, where v denotes a variable and b is a presupposition box label (e.g., x 1 : b 1 ). We then move variables from the top to the bottom layer by expressing them as special conditions b : Ref(v) and placing them before conditions on variable v. For example, b 6 : x 1 in Figure 1 becomes special condition b 6 : Ref(x 1 ) and is placed before condition b 6 : Pred(x 1 , male.n.02) in Figure 3(a).
Once top variables have been rewritten as special conditions, the resulting DRSs are converted into trees as shown in Figure3(b). Box variables (e.g., b 1 , b 5 ) become parent nodes, while conditions, which are also subtrees, become children.

Relative Variables
We rename variables with regard to their relative position in a given DRS following a predefined traversal order.
We obtain the sequence of box variables by traversing DRSs in an outer-to-inner and left-toright manner, e.g., Figure 1. For SDRSs, we replace variables in discourse relations with k i , where i denotes th ith box from left to right. For example "CONTRAST(b 1 , b 4 )" in Figure 1 is rewritten to "CONTRAST(k 0 , k 1 ). Variables and conditions within presupposition boxes are rewritten to B i , where i ∈ Z denotes the distance of the current box to the presupposition box. For example, b 1 : Agent(e 1 , x 1 ) is rewritten to B 0 : Agent(e 1 , x 1 ) because it is in the current box b 1 , while b 1 : Pred(x 1 , "male.n.02") is rewritten to B −2 : Pred(x 1 , "male.n.02") because it is in box b 3 and two hops away from presupposition box b 1 . We use special label O for presupposition boxes pertaining to semantic content outwith the current DRS. For example, b 6 : Ref(x 1 ) is rewritten to O : Ref(x 1 ) because it introduces a new presupposition box, and b 6 : Pred(x 1 , male.n.02) is rewritten to O 0 : Pred(x 1 , male.n.02) because the condition can only be interpreted in this new presupposition box (now O 0 and previoulsy b 6 ).
We obtain a sequence of general variables by traversing conditions as they appear in the DRS. Variables introduced for the first time are denoted by their type (going from left-to-right), while subsequent mentions of the same variables are rewritten with relative indices denoting their distance from the position where they were first introduced. Take Figure 3(a) as an example. The sequence of general variables is [x 1 , x 1 , e 1 , e 1 , e 1 , x 1 , x 2 , e 1 , x 2 , x 2 , t 1 , t 1 , e 1 , t 1 , x 3 , x 3 , e 2 , e 2 , e 2 , x 3 , x 1 , e 2 , x 1 , e 2 , e 1 ], and is rewritten to [X, Figure 3(a) is shown in Figure 4 with relative variables.

Generation Model
Our generation model is based on the encoderdecoder framework, where an encoder is used to encode input DRS trees and a decoder outputs a sequence of words. A limitation of sequential encoders is that they only allow sequential information propagation without considering the structure of the input (Tai et al., 2015;Wang et al., 2019)  our case, DRS tree structures are additionally wide (the longer a document, the wider the tree) and relatively flat (see Figure 3(b)). To better model these aspects, we propose a treeLSTM encoder which takes sibling information into account. As shown in Figure 5, the hidden representations of the sibling TreeLSTM cells are updated from preceding sibling and child nodes. More formally, the hidden representation for node j is given by: where x j is the token input representation, h js is the hidden representation of the sibling node preceding j, h jp is the hidden representation of the last child of node j (Equation (1)), g * are linear functions, and σ is a sigmoid function (Equations (2)-(4)). For each node j, we obtain its cell input representation u j (Equation (1)), its input gate i j and output gate o j (Equation (2)), and two forget gates f js (Equation (3)) and f jp (Equation (4)) for its neighbor cell and the last child cell, respectively.
The memory of the current cell c j (Equation (5)) is updated by the gated sum of its cell input representation and the memories of its neighbor and child cells. The hidden representation of current node h j is computed with its output gate o j (Equation (6)).
Finally, a DRS tree is represented by the hidden representations of its nodes [h 0 , h 1 , ..., h n −1 ] as computed by the sibling treeLSTM (n denotes the number of nodes). The decoder is a standard LSTM with global attention (Bahdanau et al., 2015).

Condition Ordering
As discussed previously, DRSs at test time may exhibit an arbitrary order of conditions, which our model should be able to handle. Our solution is to to reorder conditions prior to generation by learning a latent canonical order from training data (e.g., to recover boxes b 1 and b 3 in Figure 1 from boxes b 1 and b 3 in Figure 2). More formally, given a set of conditions R set , we obtain an optimal ordering R = [r 0 , r 1 , ..., r n−1 ] such that: where π(R set ) are all permutations of R set , and R * is the order with the highest likelihood according to SCORE K . Here, K parametrizes SCORE as "knowledge" we collect from our training data by observing canonical orders of conditions. Unfortunately, the time complexity of calculating Equation (7) is O(n!), we must enumerate all possible permutations for a set of conditions with n as large as 180. Since this is prohibitive, we resort to graph ordering which allows us to recover the order of the conditions without enumeration.

Graph Construction
We construct a graph from the set of DRS conditions which we break down into graph nodes and edges. Conditions in DRSs can be simple or complex according to their type of arguments. A simple condition might have a relation name with two arguments (e.g., Named(x 3 , "tom") and Agent(e 1 , x 3 )), while a complex condition has a scoped name (e.g., possibility ) and takes one or more DRSs as arguments. Simple conditions are denoted by a 3-tuple (l s , a 0 , a 1 ), where l s is the condition name (e.g., Named and Agent) and a 0 and a 1 are its first and the second argument, respectively, which could be a variable or constant (e.g., e 1 , x 3 and "piano.n.01"). Complex conditions are a 2-tuple (l c , V r ), where l c is the scope name, and V r the set of arguments scoped by the condition. For example, the set of arguments for the possibility scope ( ) in Figure 1 is {e 1 , e 2 , x 1 , x 3 , "tom", "stop.v.05", "male.n."}. Condition names become nodes in our graph. Simple conditions are further divided into constant and thematic nodes. Constant nodes are (a)

Conditions
Nodes Edges Pred(x 1 , "male.n.02") Pred "male.n.02" a 0 = x 1 Pred(e 1 , "play.v.03") Pred "play.v.03" a 0 = e 1 Agent(e 1 , x 1 ) Agent a 0 = e 1 , a 1 = x 1 Theme(e 1 , x 2 ) Theme a 0 = e 1 , a 1 = x 2 Pred(x 2 , "piano.n.01") Pred "piano.n.01" a 0 = x 2 Pred(t 1 , "now.n.01") Pred "now.n.01" a 0 = t 1 temp after(e 1 , t 1 ) temp after Pred "play.v.03" Agent Theme temp after Pred "male.n.02" Pred "piano.n.01" Pred "now.n.01" constructed by concatenating the relation name in the condition with the constant argument (e.g., condition Pred(x 1 , "male.n.02") becomes node Pred "male.n.02"). Thematic nodes correspond to the relation name of the thematic condition (e.g., Agent(e 1 , x 1 ) becomes the node "Agent"). Complex nodes correspond to the name of complex conditions (e.g., possibility ). We insert edges between graph nodes if these share arguments. For example, in Figure 6(b), there is an edge connecting node Pred "male.n.02" with Agent as they share argument x 1 . We label this edge with a 1 to denote the fact that it is the second argument of Agent. Another edge is drawn between Pred "play.v.03" and Agent (as they share argument e 1 ) with label a 0 denoting that this is the first argument of Agent. Edges between nodes are bidirectional, with inverse edges bearing the suffix "-of". Edges drawn between constant and complex nodes bear the label "Related", while edges between two constant nodes (with the same variables) bear the label "Equal" (we provide a more formal description in the Appendix).
Ordering Model Given graph G = (R set , E), where R set = {r 0 , r 1 , ..., r n−1 } is the set of nodes and E is the set of edges in G, our model outputs R * as the optimal order of R set .
As shown Figure 6(a), each node is a sequence of words. A BiLSTM is applied to obtain representation x i of each node r i = [w i 0 , ..., w i m−1 ]: We encode the graph with a Graph Convolutional Recurrent network (GCRN; Seo et al. 2018). For each node r i , we collect information from neighbor hidden representations with a gate controling the information flow from neighbors to current nodes: where e ji is the embedding of edges from node r j to r i , and k is the recurrent step in the GRU. The node hidden representations are updated as: where g G represents the hidden representation of the graph as the average of (hidden) node representations, and GRUCell denotes the gated recurrent cell function. We obtain the hidden representations of nodes in the final recurrent step (K) as Our decoder obtains the orders with the highest probability. We avoid enumerating all possible permutations for a set of nodes by generating their order autoregressively with an LSTM-based Pointer Network (PN; Vinyals et al. 2015): where θ are the parameters of the Pointer Network, h d i is the ith step hidden representation of the Pointer Network, and v, and W are parameters. Hidden representation h d i is updated by the input representation of the (i − 1)th ordered node: . All parameters are optimized with standard back-propagation.

Experiments
Our experiments were carried out on the Groningen Meaning Bank (GMB; Bos et al. 2017) which provides a large collection of English documents annotated with DRSs. We used the standard training, development, and test splits that come with the distribution of the corpus. All DRSs in the GMB were preprocessed into the tree-based format discussed in Section 2.1. We also extracted from the training data conditions and their order for training our graph ordering model. Dataset statistics are shown in Table 1.

Condition Ordering
Models and Settings Before evaluating our generator per se, we assess the effectiveness of the proposed condition ordering model (see Section 2.4). Specifically we compare four kinds of graphs: NoEdges, is a graph without edges; FullEdges, is a complete graph where each pair of nodes has edges; SiGraph, is the proposed graph without bidirectional edges; and BiGraph, is the proposed graph with bidirectional edges (see Figure 6). We also consider Counting, a baseline model which greedily orders pairs of conditions according to their frequency of appearance in the training data (see the Appendix for details).
For all neural models the embedding dimension was 50 and the hidden dimension 300. The bidirectional LSTM used for representing the graph nodes has a single layer, and the recurrent step in the GCRN is 2 (K = 2). We applied the Adam optimizer (Kingma and Ba, 2014). We use accuracy to measure the percentage of absolute orders which are predicted correctly and Kendall's τ coefficient to measure the relationship between two lists of ordered items; τ ranges from −1 to 1, where −1 means perfect inversion and 1 means perfect agreement.
Results Table 2 summarizes our results. SiGraph performs better than NoEdges (+14.83% accuracy), showing that edge information is helpful for the representation of nodes which are used to order conditions. FullEdges performs worse than SiGraph (−13.68% accuracy), underlying the fact that graph structure matters (i.e., edges are helpful when connecting certain pairs of nodes). BiGraph achieves the best ordering performance by a large margin compared to SiGraph (+9.63 % accuracy). One possible reason is that bidirectionality ensures all nodes have incoming edges, which can be used to update the node representations.

Models
Acc (

Ideal-World Generation
Models and Settings We first examine generation performance in an ideal setting where (gold standard) condition orders are given and the indices of variables are fixed. We compared the proposed treeLSTM against Seq, a baseline sequence-to-sequence model which adopts a bidirectional LSTM as its encoder. 4 Trees were linearized in a top-down and left-to-right fashion, X = [x 0 , x 1 , ...x n−1 ], where n is the tree length. We obtained hidden representations H = [h 0 , h 1 , ..., h n−1 ] of the input with: In addition, we included various models with treebased encoders: ChildSum, is the bidirectional childsum-treeLSTM encoder of Tai et al. (2015); it operates over right-branch binarized trees; Nary, is the bidirectional Nary-TreeLSTM of Tai et al. (2015), again over right-branch binarized trees; 5 and Sibling is our bidirectional sibling-TreeLSTM. All models were equipped with the same LSTM decoder, global attention (Bahdanau et al., 2015), and the copy strategy of See et al. (2017).
The embedding dimension was 300 and the hidden dimension 512. All encoders and decoders have 2 layers. The detailed settings are shown in 4 The length of the input tokens can be around 4,000. 5 We experimented with n-ary (n > 2) trees, but found that binary trees perform best. Right-branch binary trees are also empirically better than left-branch ones.   the Appendix. We measure generation quality with case-insensitive BLEU (Papineni et al., 2002).
Results Table 3 shows our results on the development dataset. Overall, treeLSTM models performs better (average +1.69 BLEU) than sequence models. Nary performs better (+0.26 BLEU) than ChildSum because the latter cannot model the order of children. Sibling performs best (74.22 BLEU), because it it not only encodes the tree structure but also keeps track of sequential information.

Real-World Generation
Models and Settings We finally, present our results in a more realistic setting where both problems of condition ordering and variable naming must be addressed. We recover condition order using four approaches: a Naive method which has no special-purpose ordering mechanism; the order of conditions is random in the development/test sets and fixed in the training set; Random, the order of conditions is random in the training, development, and test sets; Counting, the order of conditions is recovered by the Counting method; GraphOrder recovers the order of conditions with BiGraph. All comparison systems employ variable renaming as introduced in Section 2.2. We report experiments with a sequence-to-sequence generator and our sibling-TreeLSTM. Results Table 4 summarizes our results on the development set. Naive performs poorly, indicating that both Seq and Sibling models are sensitive to the order of conditions. Random, has higher variance with Seq (+16.51) compared to Sibling. Hidden representations for each timestep in Seq are heavily influenced by all previous steps, which are sequentially encoded; subtrees are encoded as a unit in Sibling, which is a more global representation for capturing patterns. Overall, we observe that the order of conditions plays a key role in the generation: both Seq and Sibling models improve when ordering of conditions is explicitly incorporated (either with Counting or GraphOrder). We observe that the combination of Sibling with GraphOrder achieves the best results (58.73 BLEU). Table 5 presents our results on the test set. We compare our Sibling encoder against a sequential one. Both models are interfaced with GraphOrder. We also compare to a previous graph-to-text model (Song et al., 2018;Damonte and Cohen, 2019) which has been used for generating from AMRs. We converted DRSs to graphs following the method of Liu et al. (2020); graphs were encoded with a GCRN (Seo et al., 2018) and decoded with an LSTM. As can be seen, Sibling+GraphOrder outperforms all comparison systems achieving a BLEU of 59.26. However, compared to ideal-world generation (see Table 3) there is still considerable room for improvement. Figure 7 shows model performance on test set against DRS size (i.e., the number of nodes in a DRS tree). Perhaps unsurprisingly, we see that generation quality deteriorates with bigger DRSs (i.e., with >1,600 nodes).

Analysis
While BLEU is frequently adopted as an automatic evaluation metric for genration tasks, it is somewhat problematic in our case as it merely calculates word overlap between generated and goldstandard text without assessing whether model output is faithful to the semantics of the input (i.e., the DRS meaning representations). To this effect, we present examples of text generated by our model, demonstrating how the DRS input constrains and affects the output text. Figure 8 shows examples of text generation from the test set. In the first example, the model generates the word because from the rhetorical relation, BECAUSE(b 10 , b 12 ). Temporal information (highlighted in blue in the figure) is also accurately reflected in the generated text (sell is inflected to its present tense form). In addition, the model tends to over-generate (e.g., the word dollar is mentioned twice) and sometimes misses out on important determiners (e.g., some). In the second example, the model generates the word themselves referring to the entities mentioned before, e.g., x 29 equals to x 27 which refers to inmates, resolving the coreference. In the third example, the model generates the modal verb must in accordance with the scope operator NEC (a shorthand for Necessity, ). Also, the model generates all for food and goods corresponding to the Implication (IMP) condition (i.e., ∀x(P (x) → Q(x))).

Related Work
Much previous work has focused on text generation from formal representations of meaning focusing exclusively on isolated sentences or queries. The literature offers a collection of approaches to generating from AMRs most of which employ neural models and structured encoders (Song et al., 2018;Beck et al., 2018;Damonte and Cohen, 2019;Ribeiro et al., 2019;Zhu et al., 2019;Cai and Lam, 2020;Wang et al., 2020). Other work generates text from structured query language (SQL) adopting either sequence-to-sequence (Iyer et al., 2016) or graph-to-sequence models (Xu et al., 2018). Basile (2015) was the first to attempt generation from DRT-based meaning representations. He proposes a pipeline system which operates over graphs and consists of three components: an alignment module learns the correspondence between surface text and DRS structure, an ordering module determines the relative position of words and phrases in the surface form and a realizer generates the final text. Narayan and Gardent (2014) simplify complex sentences with a two-stage model which DRS ... DRS( b 10 Pred( b 10 x 24 "speculator.n.01" ) Pred( b 10 x 25 "dollar.n.01" ) Pred( b 10 e 8 "sell.v.01" ) Agent( b 10 e 8 x 24 ) Theme( b 10 e 8 x 25 ) Pred( b 4 t 1 "now.r.01" ) Equ( b 10 X 26 t 1 ) temp includes( b 10 t 5 x 26 ) temp overlap( b 10 e 8 t 5 ) ) SDRS( b 12 DRS( b 11 Pred( b 3 x 5 "thing.n.12" ) Pred( b 11 e 9 "expect.v.01" ) Agent( b 11 e 9 x 5 ) ... BECAUSE( b 10 b 12 ) ) ) Gold ... some speculators are selling dollars because they expect ... Ours ... , the dollar . speculators are selling dollars because they expect ...  first performs sentence splitting and deletion operations over DRSs and then uses a phrase-based machine translation model for surface realization.
Our work is closest to Basile (2015); we share the same goal of generating from DRSs, however, our model is trained end-to-end and can perform long-form generation for documents and sentences alike. We also adopt an ordering component, but we order DRS conditions rather than lexical items, and propose a model capable of inferring a global order. There has been long-standing interest in information ordering within NLP (Lapata, 2003;Abend et al., 2015;Chen et al., 2016;Gong et al., 2016;Logeswaran et al., 2018;Cui et al., 2018;Yin et al., 2019;Honovich et al., 2020). Our innovation lies in conceptualizing ordering as a graph scoring task which can be further realized with graph neural network models (Wu et al., 2020).

Conclusions
In this paper, we have focused on document-level generation from formal meaning representations. We have adopted DRT as our formalism of choice and highlighted various challenges associated with the generation task. We have introduced a novel sibling treeLSTM for encoding DRSs rendered as trees and shown it is particularly suited to trees with wide branches. We have experimentally demonstrated that our encoder coupled with a graph-based condition ordering model outperforms strong comparison systems. In the future, we would like to embed our generator in practical applications such as summarization and question answering.

A Counting Method
We count how frequently condition r i appears before r j with type t. Type t is identified according to the overlap between the arguments of the two conditions 6 . For example, r i = (Named, x 3 , "tom") and r j = (Agent, e 1 , x 3 ) has the type "a 0 → a 1 ", showing that the first argument in r i equals to the second argument in r j . We score the order of two conditions using the following function: where COUNT returns the frequency of a pair of conditions subject to a dataset or corpus K; the score increases with r j following r i more frequently than preceding it.

A.1 Types
We define different types of relations between two conditions based on argument overlap (i.e., two simple conditions, a simple and a complex condition, and two complex conditions).
Simple and Simple Given two simple conditions, r i = (l i , a 0i , a 1i ) and r j = (l j , a 0j , a 1j ) (with different arguments), we define their types as: • t = a 0 → a 0 if a 0i = a 0j and a 1i = a 1j .

• t = None if others
Simple and Complex Given a simple conditions r i = (l i , a 0i , a 1i ) and a complex condition r j = (l j , V j ), the types are defined as: • t = 1 if a 0i ∈ V j and a 1i ∈ V j .
• t = 1 if a 0i ∈ V j and a 1i ∈ V j .
• t = 2 if a 0i ∈ V j and a 1i ∈ V j .
• t = None if others 6 We only consider (and count) conditions with overlapping rguments, i.e., their type t is not None.

Algorithm 1 Greedy Ordering Algorithm
Input: Rset, set of conditions Output: R * , list of ordered conditions 1: R * = [] 2: while Rset is not empty do 3: r * = arg max r∈R set PARTIAL(r) 4: Rset removes r * 5: R * appends r * 6: end while Complex and Complex Given two complex conditions, r i = (l i , V i ) and r j = (l j , V j ), the types are defined as:

A.2 Greedy Algorithm
We generate an ordering of conditions following greedy Algorithm 1 which identifies the highest scoring pair in R set , appends it to partial ordering R * , and keeps going until R set is empty. The score is given by the function:

B Edge Construction Algorithm
Algorithm 2 shows how edges are created for graph ordering.

C Model Settings
In the following, we report the best experimental settings for our condition ordering and text generation models.

C.1 Condition Ordering Models
Model hyperparameters are shown in Table 6. The size of the token embeddings (in the graph nodes) and edge embeddings is 100. The node hidden dimension is the same as the hidden dimension of the BILSTM which is used to encode a sequence of input words for each node and the hidden dimension of PointerNet. The BILSTM and PointerNet have one layer. Training hyperparameters are shown in Table 7.
if nt.a1 is ne.a0 then 8: end if 11: end for 12: for nc, ne in Nc, Ne do 13: if ne.a0 in nc.V then 14: if ne.a0 is n e .a0 and ne not is n e then 20:

C.2 Text Generation Models
Model hyperparameters are shown in Table 8. The size of the input embeddings in the encoder and the decoder is 300. The hidden dimensions of the encoder and decoder are 512. Both the encoder and decoder have two layers. The hyperparameters of the training are shown in Table 9.

D Hyperparameter Tuning
We show below model performance with various hyperparameters. Best hyperparameters were manually chosen after monitoring model accuracy on the development set.
We take BiGraph as our final condition ordering model (see Table 10) and Sibling with BiGraph as the DRS-to-text generation model (see Table 11).

E Examples
We provide example output of our final model (Sibling+GraphOrder) on the GMB test dataset. The DRS in tree format with condition ordering given by GraphOrder is shown in Figures 9-11. Figure 10 expands nonterminal b 33 in Figure 9, and Figure 11 expands nonterminal b 28 in Figure 10. The corresponding document generated by Sibling is: the u.s. dollar hit a record low against the euro tuesday . it took a dollar , but 48 cents to buy one euro , and a series of problems including the key of the u.s. housing sector in which has been battered by the slowing n.01 , including continuing the u.s. economy , the dollar . speculator are selling dollars because they expect that the u.s. central bank will try to stimulate the economy by cutting interest rates soon . u.s. lower interest rates can cut the return on investments . the falling dollar is prompting oil-rich nations around the persian gulf to consider ending the practice of linking the value of its currency to those of the dollar and instead supplement the u.s. currency . such a move would reduce demand for dollars and weaken the u.s. currency .   Figure 11: A partial DRS in tree format with condition ordering recovered by GraphOrder.