A New Representation for Span-based CCG Parsing

This paper proposes a new representation for CCG derivations. CCG derivations are represented as trees whose nodes are labeled with categories strictly restricted by CCG rule schemata. This characteristic is not suitable for span-based parsing models because they predict node labels independently. In other words, span-based models may generate invalid CCG derivations that violate the rule schemata. Our proposed representation decomposes CCG derivations into several independent pieces and prevents the span-based parsing models from violating the schemata. Our experimental result shows that an off-the-shelf span-based parser with our representation is comparable with previous CCG parsers.


Introduction
Combinatory Categorial Grammar (CCG) (Steedman, 2000) is a mildly context-sensitive grammar formalism. Several neural CCG parsing methods have been proposed so far (Lewis and Steedman, 2014;Xu et al., 2015;Vaswani et al., 2016;Xu, 2016;Yoshikawa et al., 2017;Steedman, 2019, 2020;Bhargava and Penn, 2020;Tian et al., 2020;Prange et al., 2021;Liu et al., 2021). Currently, neural span-based models (Cross and Huang, 2016;Stern et al., 2017;Gaddy et al., 2018;Kitaev and Klein, 2018) have been successful in the field of constituency parsing. However, we cannot directly apply this technique to CCG parsing. Span-based models assume that each node label in parse trees can be predicted independently, while, in CCG, each node label (category) is strictly restricted by CCG rule schemata. The independence assumption of span-based models implies that the models are not guaranteed to generate valid CCG derivations.
To solve this problem, we propose a method of representing CCG derivations in a way suitable for span-based parsing models. Our proposed repre- sentation decomposes CCG derivations into several independent pieces and can prevent the spanbased parsing models from violating the CCG rule schemata. Furthermore, as a by-product of our representation, the parsing models can assign outof-vocabulary (OOV) categories, which have not appeared in training data. This characteristic has been attracting attention in CCG parsing research (Bhargava and Penn, 2020;Prange et al., 2021;Liu et al., 2021). Our experimental result shows that an off-the-shelf span-based parser with our representation is comparable with previous CCG parsers and can generate correct OOV categories.

CCG and Span-based Parsing
This section gives an overview of Combinatory Categorial Grammar (CCG) (Steedman, 2000) and explains why we cannot directly apply the spanbased approach to CCG parsing.

Combinatory Categorial Grammar
CCG represents syntactic information by basic categories (e.g., S, NP) and complex categories. Complex categories are in the form of X/Y or X\Y , where X and Y are categories. Intuitively, each category X/Y means that it receives a category Y from its right and returns a category X. In the case of X\Y , the direction is from its left. Formally, categories are combined using CCG rule schemata. Figure 1 shows CCG rule schemata.
Here, X, Y and Z i (1 ≤ i ≤ d) are categories, and | i ∈ {/, \}. | 1 Z 1 · · · | d Z d is called an argument stack (Kuhlmann and Satta, 2014), and we use a Greek letter to represent an argument stack. For example, we use the following notation for the first rule schema: (1) We define |α| = d and the arity of a category Y = Xα where X is a basic category is defined as follows:

Span-based Parsing
A span-based parsing model (Stern et al., 2017;Gaddy et al., 2018;Kitaev and Klein, 2018) has a single scoring function s(i, j, l) that scores each label l for each span (i, j). The score of a tree T is defined as follows: The parsing problem is formulated as finding the tree T * with the highest score: and can be solved using an efficient CKY-like parsing algorithm because of the following characteristic: 1 • The model can determine each label l for a span (i, j) independently of the other spans.
Unfortunately, CCG parsing cannot take this approach because each label (category) is strictly restricted by the CCG rule schemata. If we apply the span-based approach to CCG parsing forcibly, the following problem occurs: • The parsing model may generate invalid CCG derivations that violate the CCG rule schemata.

Span-based representation
To overcome the problem described in the previous section, we propose a new representation for CCG derivations. We call the new ones span-based representations (SBRs for short), which decomposes CCG derivations into several independent pieces to prevent the span-based parsing model from violating the CCG rule schemata. Figure 2 shows an example of CCG derivation and its SBR version. We realize span-based CCG parsing as follows: 1. Convert CCG derivations into SBRs (Section 3.2).
2. Train a span-based parsing model using SBRs and parse sentences to generate SBRs.
The basic idea behind our method is that each node label in an SBR represents a constraint on the categories of nodes in a CCG derivation. Our method recovers a CCG derivation from its SBR version by satisfying such constraints. Because constraints encoded in SBR's labels are independent, a span-based model using SBRs does not suffer from violating CCG rule schemata.

SBR's label
An SBR's label consists of the following information: • a CCG rule schema • a mapping from variables that occur only in the left-hand side of the rule to categories For each node n (except leaf nodes) in a CCG derivation, its SBR version has a corresponding node. The SBR's label means that the category of n is created by the specified rule schema, and the categories of n's children satisfy the constraint represented by the mapping. For example, the label (> 0 , Y := NP) means that the left and right children's categories are in the form of X/NP and NP and X is inherited from its parent's category.

Additional information
SBR's label cannot encode root categories of CCG derivations and unary rules. To encode this information, we introduce three types of additional information: • RT :X means that the category of the node n is X, if n is the root node.
• UL :X means that the left child l is unary branching and the category of l's child is X.
• UR :X means that the right child r is unary branching and the category of r's child is X.
We call these information tags.

Converting CCG derivations into SBRs
Algorithm 1 obtains an SBR from a CCG derivation. Table 1 summarizes the conversion from categories into SBR's labels. Algorithm 1 uses this table in the function SBRlabel that returns a SBR's label. Here, we introduce two additional patterns for adjuncts or type-raised categories (shown in the last two rows). 2 Introducing these patterns reduces the number of SBR's labels.

Converting SBRs into CCG derivations
Algorithm 2 recovers a CCG derivation from an SBR. The recovery process proceeds in a topdown fashion. First, the root label is recovered from an additional tag RT. 3 That is, we call recover(n, RT(label(n))) for an SBR n. Then, the categories of the children are recovered using Table 1 in reverse (the function recoverCAT(S, P ) returns the categories). This process is repeated recursively until the leaf nodes are reached. When the SBR's label is in the form of (> d , · · ·) or (< d , · · ·) and arity(P ) < d, L and R cannot be defined. In this case, recoverCAT(S, P ) replaces d with 2 These are special cases of the CCG rule schemata shown in Figure 1.
3 If the parsing model fails to assign RT tag, we use RT : Sdcl as a default.

Algorithm 1 convert(n)
1: n is a CCG derivation node. 2: label(n) is the label of n. 3: par(n) is the parent of n. 4: chiL(n), chiR(n) and chiU(n) are the left, right and unary child of n. 5: node(l, C) makes a node with a label l and children C. 6: 7: if n is a preterminal node then 8: n ← chiu(n) 9: else if n is binary branching then 10: l, r ← chiL(n), chiR(n) 11: L, R, P ← label(l), label(r), label(n) 12: S ← SBRlabel(L, R, P ) 13: if n is a root node then 14: add RT :P to S 15: end if 16: if l is unary branching then 17: l ← chiu(l) 18: add a tag UL :label(l) to S 19: end if 20: if r is unary branching then 21: r ← chiu(r) 22: add a tag UR :label(r) to S 23: end if 24: n ← node(S, convert(l), convert(r) ) 25: end if 26: return n arity(P ).

Generating OOV categories
In our proposed representation, lexical categories are not directly assigned to words. Lexical categories are decomposed into several node labels. This means that lexical categories are not defined by a finite set and that the span-based parsing model learned from SBRs may generate OOV lexical categories that do not appear in the training data.

Experiment
We conducted an experiment using the CCGBank (Hockenmaier and Steedman, 2007) 4 to evaluate the performance of our method. 5 We used the Berkeley Neural Parser (Kitaev and Klein, 2018) with BERT (Devlin et al., 2019) as a span-based parser. We converted the training (sections 02-21) and the development (section 00) data into SBRs and learned the model from the data. The number of SBR's labels in the training data was 486. 6 The hyperparameters for training were identical to those of Kitaev et al. (2019). We evaluated the parsing performance by labeled F 1 on the test data (section 23). We obtained labeled dependencies using the C&C parser's generate program (Clark and Curran, 2007). As a baseline model, we trained a model directly using the CCG derivations. Table 2 shows parsing performances on the test data. Our proposed and the baseline methods have high precision (92.8% and 94.0%) but low recall (82.2% and 76.3%). One of the reasons for the low 4 In the CCGBank, adjuncts and type-raised categories take an argument category using feature unification. Our method treats this feature unification in the last two rows in Table 1. SBRlabel does not allow the X occurring in X/(Xβ) and X\(Xβ) to have any feature, and recoverCat removes features from the X. 5 The code is available at https://github.com/ yosihide/span-based-ccg-derivation. 6 The training data has 1639 categories including 1285 lexical ones (supertags).

Method
Pre. Rec. F1 Lewis and Steedman (2014) --86.1 Xu et al. (2015) 87.7 86.4 87.0  88.6 87.   recall was that the C&C parser's generate program failed to obtain dependencies from the output CCG derivations. Our proposed and the baseline methods failed to obtain dependencies from 206 and 371 sentences of 2407 test data sentences, respectively. The generate program cannot work when the CCG derivation is invalid or has a lexical category that is not listed in its markedup file.
To mitigate this problem, we added such lexical categories to the markedup file. 7 Adding lexical categories increased the recall (87.6%) of our method significantly. On the other hand, the recall of the baseline method was still low (76.8%) due to the invalid CCG derivations. This result shows that a span-based parsing using CCG derivations does not work well and that our proposed method improves the parsing performance. The final result of our method was comparable with previous CCG parsers.

OOV categories
Another interesting point of our method is the possibility of generating OOV categories. Table 3 shows the recall for OOV lexical categories. We obtained a similar result with previous research. Our method correctly assigned OOV categories for 4 words. 8 We can say that our proposed approach can treat OOV categories.

Conclusion
This paper proposed a new representation for CCG derivations. Our proposed representation realizes a span-based CCG parser that follows the CCG binary rule schemata. Furthermore, the parser can generate OOV categories. One remaining problem in the proposed method is to treat unary rule schemata in CCG. Our method encodes unary rules using the additional information described in Section 3.1.1, but this approach may violate the unary rule schemata. In the future, we will extend the method to treat CCG unary rules validly.