In parsing phrase structures, supertagging achieves a symbiosis between the interpretability of formal grammars and the accuracy and speed of more recent neural models.The approach was only recently transferred to parsing discontinuous constituency structures with linear context-free rewriting systems (LCFRS).We reformulate and parameterize the previously fixed extraction process for LCFRS supertags with the aim to improve the overall parsing quality.These parameters are set in the context of several steps in the extraction process and are used to control the granularity of extracted grammar rules as well as the association of lexical symbols with each supertag.We evaluate the influence of the parameters on the sets of extracted supertags and the parsing quality using three treebanks in the English and German language, and we compare the best-performing configurations to recent state-of-the-art parsers in the area.Our results show that some of our configurations and the slightly modified parsing process improve the quality and speed of parsing with our supertags over the previous approach.Moreover, we achieve parsing scores that either surpass or are among the state-of-the-art in discontinuous constituent parsing.
We present the first supertagging-based parser for linear context-free rewriting systems (LCFRS). It utilizes neural classifiers and outperforms previous LCFRS-based parsers in both accuracy and parsing speed by a wide margin. Our results keep up with the best (general) discontinuous parsers, particularly the scores for discontinuous constituents establish a new state of the art. The heart of our approach is an efficient lexicalization procedure which induces a lexical LCFRS from any discontinuous treebank. We describe a modification to usual chart-based LCFRS parsing that accounts for supertagging and introduce a procedure that transforms lexical LCFRS derivations into equivalent parse trees of the original treebank. Our approach is evaluated on the English Discontinuous Penn Treebank and the German treebanks Negra and Tiger.
In the field of constituent parsing, probabilistic grammar formalisms have been studied to model the syntactic structure of natural language. More recently, approaches utilizing neural models gained lots of traction in this field, as they achieved accurate results at high speed. We aim for a symbiosis between probabilistic linear context-free rewriting systems (PLCFRS) as a probabilistic grammar formalism and neural models to get the best of both worlds: the interpretability of grammars, and the speed and accuracy of neural models. To combine these two, we consider the approach of supertagging that requires lexicalized grammar formalisms. Here, we present a procedure which turns any PLCFRS G into an equivalent lexicalized PLCFRS G’. The derivation trees in G’ are then mapped to equivalent derivations in G. Our construction for G’ preserves the probability assignment and does not increase parsing complexity compared to G.
Constituent parsing has been studied extensively in the last decades. Chomsky-Schützenberger parsing as an approach to constituent parsing has only been investigated theoretically, yet. It uses the decomposition of a language into a regular language, a homomorphism, and a bracket language to divide the parsing problem into simpler subproblems. We provide the first implementation of Chomsky-Schützenberger parsing. It employs multiple context-free grammars and incorporates many refinements to achieve feasibility. We compare its performance to state-of-the-art grammar-based parsers.