On maximum spanning DAG algorithms for semantic DAG parsing

Consideration of the decoding problem in semantic parsing as ﬁnding a maximum spanning DAG of a weighted directed graph carries many complexities that haven’t been fully addressed in the literature to date, among which are its actual appropriateness for the decoding task in semantic parsing, not to mention an explicit proof of its complexity (and its ap-proximability). In this paper, we consider the objective function for the maximum spanning DAG problem, and what it means in terms of decoding for semantic parsing. In doing so, we give anecdotal evidence against its use in this task. In addition, we consider the only graph-based maximum spanning DAG approximation algorithm presented in the literature (without any approximation guarantee) to date and ﬁnally provide an approximation guarantee for it, showing that it is an O ( 1 n ) factor approximation algorithm, where n is the size of the digraph’s vertex set.


Introduction
Recent research in semantic parsing has moved attention towards recovering labeled digraph representations of the semantic relations corresponding to the linguistic agendas across a number of theories where simple tree representations are claimed not to be expressive enough to capture sentential meaning. As digraph structures presented in predominant semantic graph databases are mainly acyclic, the semantic parsing problem has sometimes become associated with a maximum spanning directed acyclic graph (MSDAG) decoding problem (McDonald and Pereira, 2006;Sagae and Tsujii, 2008;Titov et al., 2009), in analogy and perhaps as a generalisation of the maximum span-ning tree decoding problem for syntactic dependency parsing.
The appropriateness of finding the MSDAG in decoding for semantic parsing has, however, never been fully motivated, and in fact carries more complexities than that of maximum spanning tree (MST) decoding for syntactic parsing. In this paper, we discuss the appropriateness of MSDAG decoding in semantic parsing, considering the possible objective functions and whether they match our linguistic goals for the decoding process. Our view is that they probably do not, in general.
In addition to the problem of not being sufficiently synchronised with our linguistic intuitions for the semantic parsing decoding problem, the MSDAG problem itself carries with it its own complexities, which are still in the process of becoming more understood in the algorithms research community. McDonald and Pereira (2006) claim that the MSDAG problem is NP-hard, citing (Heckerman et al., 1995); however, there is no MSDAG problem in this latter work, and no explicit reduction to any problem presented in (Heckerman et al., 1995) has been published to date. We point out that Schluter (submitted) explicitly provides a linear reduction to MSDAG from the problem of finding a minimum weighted directed multicut (IDMC), showing MSDAG's NP-hardness; this reduction also yields a result on the approximability of MSDAG, namely that it is APX-hard. We show in this paper that the approximation algorithm presented without any approximation guarantee in (McDonald and Pereira, 2006) is, in fact, a O( 1 n ) factor approximation algorithm, where n is the size of the graphs vertex set. This is not particularly surprising given the problem's APX-hardness.
Following some preliminaries on weighted digraphs (Section 2), we make the MSDAG problem precise through a discussion of the objective function in question and briefly question this ob-jective function with respect to decoding in semantic parsing (Section 3). Finally, we discuss the only other graph-based (approximation) algorithm in the literature and prove its approximation guarantee (Section 4), followed by some brief conclusions (Section 5).

Preliminaries
where V is the set of nodes and E is the set of (directed) edges. E ⊂ V × V is a set of ordered pairs of vertices. For u, v ∈ V , if (u, v) ∈ E, then we say there is an "edge from u to v". If there is any confusion over the digraph we are referring to, then we disambiguate by using the notation G := (E(G), V (G)).
If all edges e ∈ E(G) are associated with a real number, a weight w : E(G) → R, then we call the digraph weighted. In this paper, all digraphs are weighted and weights are positive.
For a subset of edges U of a weighted digraph, we set w(U ) := e∈U w(e). Similarly, for a weighted digraph G, we set w(G) := e∈E(G) w(e). We denote the size of a set S, by |S|.
, then we say that there is path (also directed path, or dipath) of length (k − 1) from u 1 to u k in G. If also (u k , u 1 ) ∈ E, then we say that u 1 , u 2 , . . . , u k , u 1 is a cycle of length k in G.
A directed acyclic graph (DAG) is a directed graph with no cycles. There is a special kind of DAG, which has a special node called a root with no incoming edges and in which there is a unique path from the root to all nodes; this is called a tree.
Finally, a tournament is a digraph in which all pairs of vertices are connected by exactly one edge. If, in addition, the edges are weighted, then we call the digraph a weighted tournament.

Objective functions for finding an MSDAG of a digraph
We first make precise the objective function for the MSDAG problem, by considering two separate objective functions, one additive and one multiplicative, over the weights of edges in the optimal solution: w(e), and (1) Maximising Equation (2) amounts to concurrently minimising the number of edges of weight less than 1 in the optimal solution and maximising the number of edges of weight at least 1. In fact, if all edge weights are less than 1, then this problem reduces to finding the MST. However, the objective in semantic parsing in adopting the MS-DAG problem for decoding is to increase power from finding only MSTs. Therefore, this version of the problem is not the subject of this paper. If a graph has even one edge of weight greater than 1, then all edges of lesser weights should be discarded, and for the remaining subgraph, maximising Equations (1) or (2) is equivalent.
Maximising Equation (1) is complex and under certain restrictions on edge weights may optimise for simply the number of edges (subject to being a DAG). For example, if the difference between any two edge weights is less than 1 |E(G)| × w(e) for the smallest weighted e in E(G), then the problem reduces to finding the spanning DAG with the greatest number of edges, as shown by Proposition 1.
Proposition 1. Let G be a weighted digraph, with minimum edge weight M . Suppose the difference in weight between any two edges of G is at most 1 |E(G)| ×M . Then an MSDAG for G maximises the number of edges of any spanning DAG for G.
Proof. Suppose D 1 , D 2 are spanning DAGs for G, such that (without loss of generality) |E(D 1 )| = |E(D 2 )| + 1, but that D 2 is an MSDAG and that D 1 is not. We derive the following contradiction.
Admittedly, for arbitrary edge weights, the relation between the sum of edge weights and number of edges is more intricate, and it is this problem that we refer to as the MSDAG problem in this paper. However, the maximisation of the number of edges in the MSDAG does play a role when using Equation (1) as the objective function, and this may be inappropriate for decoding in semantic parsing.

Two linguistic issues of in MSDAG decoding for semantic parsing
We can identify two important related issues against linguistic motivation for the use of MS-DAG algorithms in decoding for semantic parsing. The first problem is inherited from that of the arcfactored model in syntactic parsing, and the second problem is due to MSDAGs constrained maximisation of edges discussed above.
In the arc-factored syntactic parsing paradigm, it was shown that the MST algorithm could be used for exact inference (McDonald et al., 2005). However, one problem with this paradigm was that edges of the inferred solution did not linguistically constrain each other. So, for example, a verb can be assigned two separate subject dependents, which is linguistically absurd. Use of the MS-DAG algorithm in semantic parsing corresponds, in fact, to a generalisation of the arc-factored syntactic parsing paradigm to semantic parsing. As such, the problem of a lack of linguistic constraints among edges is inherited by the arc-factored semantic parsing paradigm.
However, MSDAG decoding for semantic parsing suffers from a further problem. In MST decoding, the only constraint is really that the output should have a tree structure; this places a precise restriction on the number of edges in the output (i.e., n − 1), unlike for MSDAGs. From our discussion above, we know that the MSDAG problem is closely related to a constrained maximisation of edges. In particular, a candidate solution s to the problem that is not optimal in terms of total weight may, however, be linguistically optimal; adding further edges to s would increase weight, but may be linguistically undesirable.
Consider the tree at the top of Figure 1, for the example John loves Mary. In decoding, this tree could be our linguistic optimal, however according to our additive objective function, it is more likely for us to obtain either of the bottom DAGs, which is clearly not what is wanted in semantic parsing.

Related Research and an Approximation Guarantee
The only algorithm presented to date for MSDAG is an approximation algorithm proposed by Mc-

John loves Mary
John loves Mary John loves Mary Donald and Pereira (2006), given without any approximation guarantee. The algorithm first constructs an MST of the weighted digraph, and then greedily attempts to add remaining edges to the MST in order of descending weight, so long as no cycle is introduced. Only part of this algorithm is greedy, so we will refer to it as semi-greedy-MSDAG. Given the fact that MS-DAG is APX-hard (Schluter, submitted), the following approximation guarantee is not surprising.
Proof. We separate the proof into two parts. In Part 1, we first consider an easy worst case scenario for an upper bound on the error for semi-greedy-MSDAG, without any consideration for whether such a graph actually exists. Following this in Part 2, we construct a family of graphs to show that this bound is tight (i.e., that the algorithm exhibits worst imaginable behaviour for this family of graphs).
Part 1. For G a digraph, let D be the output of semi-greedy-MSDAG on G, and D * be an MS-DAG for G. The worst case is (bounded by the case) where the algorithm finds an MST T * for G but then cannot introduce any extra edges to obtain a spanning DAG of higher weight, because the addition of any extra edges would induce a cycle. For G's nodes, we suppose that |V (G)| > 3. For edges, we suppose that all the edges in T * have equally the largest weight, say w max , of any edge in E(G), and that all other edges in E(G) have weight O(w max ). We can do this, because it gives an advantage to T * .
We suppose also that the underlying undirected graph of G is complete and that the true MSDAG for G is D * := (V (G), E(G) − E(T * )).
This clearly must be the worst imaginable case: that T * shares no edges with D * , but that D * con-tains every other possible edge in the graph, with the weight of every edge in D * being at most the weight of the maximum weighted edge of those of T * (remember we are trying to favour T * ). No other imaginable case could introduce more edges to D * without inducing a cycle. So, for all G, and we had that w(T * ) = w(D) = O(n · w max ). So at very worst, semi-greedy-MSDAG finds a spanning DAG D of weight within O( 1 n ) of the optimal.
Part 2. We now show that this bound is tight. We construct a family of graphs G n = (V n , E n ) as follows. V n := {v 0 , v 1 , v 2 , . . . , v n }, with n > 3, and we suppose that n is even. Let c ∈ R be some constant such that c > 3. We place the following edges in E n : 1} into E n , creating a path from v 0 to v n of length n where every edge has weight c, and So, in addition to the path defined by the edges in (E1), G n − {v 0 } contains a weighted tournament on n − 1 nodes, such that if j < i, then there is an edge from i to j. Let us denote the MST of G n by T * n and the maximal spanning tree obtainable by semi-greedy-MSDAG, by D n . We will show that the (unique) MSDAG of G n is the graph D * n that we construct below. It is easy to see that the MST T * n of G n consists of the edges in (E1), and that no further edges can be added to T * n without creating a cycle. So, D n = T * n .
On the other hand, we claim that there is a unique D * n consisting of: 1. the edge (v 0 , v 1 ), 2. the edges (v 2i−1 , v 2i ) for all i ∈ {1, . . . , n/2} into E(D * n ), that is every second edge in the path described in (E1), and 3. all the edges from (E2) except for (v 2i , v 2i−1 ) for all i ∈ {1, . . . , n/2}. We can easily see that D * n is at least maximal. The only edges not in D * n are ones that are parallel to other edges. So, introducing any other edge from (E2) would mean removing an edge from (E1), which would decrease D * n 's overall weight. Moreover, notice that introducing any other edge from (E1), say (v k−1 , v k ) would require removing two edges (from (E2)), either (v k , v k−1 ) and (v k+1 , v k−1 ) or (v k , v k−1 ) and (v k , v k+1 ), to avoid cycles in D * n , but this also decreases overall weight. We extend these two simple facts in the remainder of the proof, showing that D * , in addition to being maximal, is also a global maximum.
We prove the result by induction on n (with n even), that D * n is the MSDAG for G n . We take n = 4 as our base case.
For G 4 (see Figure 2), E(G 4 )−E(D * 4 ) contains only three edges: the edge (v 2 , v 3 ) of weight c and the edges (v 4 , v 3 ) and (v 2 , v 1 ) of weight (c − 1). Following the same principles above, adding the edge (v 2 , v 1 ) would entail removing an edge of higher weight; the same is true of adding the edge (v 4 , v 3 ). No further edges in either case could be added to make up this difference and achieve greater weight than D * 4 . So the only option is to add the edge (v 2 , v 3 ). However, this would entail removing either the two edges (v 3 , v 2 ) and (v 4 , v 2 ) or (v 3 , v 2 ) and (v 3 , v 4 ) from D * 4 , both of which actions results in lower overall weight. Now suppose D * n−2 is optimal (for n ≥ 6, n even). We show that this implies D * n is optimal (with n even). Consider the two subgraphs G n−2 and H of G n induced by the subsets of V (G n ), V (G n−2 ) = {v 0 , . . . , v n−2 } and V (H) := {v n−1 , v n } respectively (so H = (V (H), {(v n−1 , v n ), (v n−1 , v n )})). We are assuming that the MSDAG of G n−2 is D * n−2 . Moreover, the MSDAG of H is a single edge, D H := (V (H), {(v n−1 , v n )}). D * n includes the MSDAGs (D * n−2 and D H ) of these two digraphs, so for these parts of D * n , we have reached an upper bound for optimality. Now we consider the optimal way to connect D * n−2 and D H to create D * n . Let C denote the set of edges in G n which connect D H to G n−2 and vice versa.
Note that the only edge from C not included in D * n is e C := (v n−2 , v n−1 ). By the discussion above, we know that including e C would mean excluding two other edges from C of weight at least (c−1), which cannot be optimal. Therefore D * n must be optimal. So we have constructed a family of graphs G n where w(D n ) = w(T * n ) = nc and This completes the proof that semi-greedy-MSDAG is an O nc n(n−1) 2 (c−1)− n 2 ·c = O( 1 n ) factor approximation algorithm for MSDAG. Now let us consider the version of the statement of the MSDAG problem that, rather than maximising the weight of a spanning DAG D * of a weighted digraph G, looks to minimise the weight of the set C * of edges that must be removed from G in order for G − C * to be an MS-DAG for G. Clearly these problems are identical. We refer to the minimisation version of the statement as M SDAG C , and to C * as the complement (in G) of the MSDAG D * := G − C * . Also, let semi-greedy-MSDAG C be the same algorithm as semi-greedy-MSDAG except that it outputs C * rather than D * .
Using the same graphs and proof structure as in the proof of Theorem 2, the following theorem can be shown.
Theorem 3. semi-greedy-MSDAG C is an O(n) factor approximation algorithm for MSDAG C .

Conclusions and Open Questions
This paper provides some philosophical and mathematical foundations for the MSDAG problem as decoding in semantic parsing. We have put forward the view that the objective in semantic parsing is not in fact to find the MSDAG, however it remains open as to whether this mismatch can be tolerated, given empirical evidence of MSDAG decoding's utility in semantic parsing. We have also pointed to an explicit proof of the APX-hardness (that of (Schluter, submitted)) of MSDAG and given an approximation guarantee of the only published approximation algorithm for this problem.
In particular, Schluter (submitted) provides an approximation preserving reduction from MSDAG C to IDMC. Moreover, the best known approximation ratio for IDMC is O(n 11 23 ) (Agarwal et al., 2007), which yields a better (in terms of worst case error) approximation algorithm for MSDAG C . An interesting open problem would compare these two decoding approximation algorithms empirically for semantic parsing decoding and in terms of expected performance (or error) both in general as well as specifically for semantic parsing decoding.