Thomas Ruprecht


2022

pdf bib
Improving the Extraction of Supertags for Constituency Parsing with Linear Context-Free Rewriting Systems
Thomas Ruprecht
Findings of the Association for Computational Linguistics: EMNLP 2022

In parsing phrase structures, supertagging achieves a symbiosis between the interpretability of formal grammars and the accuracy and speed of more recent neural models.The approach was only recently transferred to parsing discontinuous constituency structures with linear context-free rewriting systems (LCFRS).We reformulate and parameterize the previously fixed extraction process for LCFRS supertags with the aim to improve the overall parsing quality.These parameters are set in the context of several steps in the extraction process and are used to control the granularity of extracted grammar rules as well as the association of lexical symbols with each supertag.We evaluate the influence of the parameters on the sets of extracted supertags and the parsing quality using three treebanks in the English and German language, and we compare the best-performing configurations to recent state-of-the-art parsers in the area.Our results show that some of our configurations and the slightly modified parsing process improve the quality and speed of parsing with our supertags over the previous approach.Moreover, we achieve parsing scores that either surpass or are among the state-of-the-art in discontinuous constituent parsing.

2021

pdf bib
Supertagging-based Parsing with Linear Context-free Rewriting Systems
Thomas Ruprecht | Richard Mörbitz
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present the first supertagging-based parser for linear context-free rewriting systems (LCFRS). It utilizes neural classifiers and outperforms previous LCFRS-based parsers in both accuracy and parsing speed by a wide margin. Our results keep up with the best (general) discontinuous parsers, particularly the scores for discontinuous constituents establish a new state of the art. The heart of our approach is an efficient lexicalization procedure which induces a lexical LCFRS from any discontinuous treebank. We describe a modification to usual chart-based LCFRS parsing that accounts for supertagging and introduce a procedure that transforms lexical LCFRS derivations into equivalent parse trees of the original treebank. Our approach is evaluated on the English Discontinuous Penn Treebank and the German treebanks Negra and Tiger.

2020

pdf bib
Lexicalization of Probabilistic Linear Context-free Rewriting Systems
Richard Mörbitz | Thomas Ruprecht
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

In the field of constituent parsing, probabilistic grammar formalisms have been studied to model the syntactic structure of natural language. More recently, approaches utilizing neural models gained lots of traction in this field, as they achieved accurate results at high speed. We aim for a symbiosis between probabilistic linear context-free rewriting systems (PLCFRS) as a probabilistic grammar formalism and neural models to get the best of both worlds: the interpretability of grammars, and the speed and accuracy of neural models. To combine these two, we consider the approach of supertagging that requires lexicalized grammar formalisms. Here, we present a procedure which turns any PLCFRS G into an equivalent lexicalized PLCFRS G’. The derivation trees in G’ are then mapped to equivalent derivations in G. Our construction for G’ preserves the probability assignment and does not increase parsing complexity compared to G.

2019

pdf bib
Implementation of a Chomsky-Schützenberger n-best parser for weighted multiple context-free grammars
Thomas Ruprecht | Tobias Denkinger
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Constituent parsing has been studied extensively in the last decades. Chomsky-Schützenberger parsing as an approach to constituent parsing has only been investigated theoretically, yet. It uses the decomposition of a language into a regular language, a homomorphism, and a bracket language to divide the parsing problem into simpler subproblems. We provide the first implementation of Chomsky-Schützenberger parsing. It employs multiple context-free grammars and incorporates many refinements to achieve feasibility. We compare its performance to state-of-the-art grammar-based parsers.