pdf
bib
Proceedings of the Third International Workshop on Parsing Technologies (IWPT ’93)
Harry Bunt
pdf
bib
abs
Monte Carlo Parsing
Rens Bod
In stochastic language processing, we are often interested in the most probable parse of an input string. Since there can be exponentially many parses, comparing all of them is not efficient. The Viterbi algorithm (Viterbi, 1967; Fujisaki et al., 1989) provides a tool to calculate in cubic time the most probable derivation of a string generated by a stochastic context free grammar. However, in stochastic language models that allow a parse tree to be generated by different derivations – like Data Oriented Parsing (DOP) or Stochastic Lexicalized TreeAdjoining Grammar (SLTAG) – the most probable derivation does not necessarily produce the most probable parse. In such cases, a Viterbistyle optimisation does not seem feasible to calculate the most probable parse. In the present article we show that by incorporating Monte Carlo techniques into a polynomial time parsing algorithm, the maximum probability parse can be estimated as accurately as desired in polynomial time. Monte Carlo parsing is not only relevant to DOP or SLTAG, but also provides for stochastic CFGs an interesting alternative to Viterbi. Unlike the current versions of Viterbi style optimisation (Fujisaki et al., 1989; Jelinek et al., 1990; Wright et al., 1991), Monte Carlo parsing is not restricted to CFGs in Chomsky Normal Form. For stochastic grammars that are parsable in cubic time, the time complexity of estimating the most probable parse with Monte Carlo turns out to be O(n^{2}𝜀^{2}), where n is the length of the input string and 𝜀 the estimation error. In this paper we will treat Monte Carlo parsing first of all in the context of the DOP model, since it is especially here that the number of derivations generating a single tree becomes dramatically large. Finally, a Monte Carlo Chart parser is used to test the DOP model on a set of handparsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments indicate 96% test set parsing accuracy.
pdf
bib
abs
TransformationBased ErrorDriven Parsing
Eric Brill
In this paper we describe a new technique for parsing free text: a transformational grammar is automatically learned that is capable of accurately parsing text into binarybranching syntactic trees. The algorithm works by beginning in a very naive state of knowledge about phrase structure. By repeatedly comparing the results of bracketing in the current state to proper bracketing provided in the training corpus, the system learns a set of simple structural transformations that can be applied to reduce the number of errors. After describing the algorithm, we present results and compare these results to other recent results in automatic grammar induction.
pdf
bib
abs
Parsing as Dynamic Interpretation
Harry Bunt

Ko van der Sloat
In this paper we consider the merging of the language of feature structures with a formal logical language, and how the semantic definition of the resulting language can be used in parsing. For the logical language we use the language EL, defined and implemented earlier for computational semantic purposes. To this language we add the basic constructions and operations of feature structures. The extended language we refer to as ‘Generalized EL’, or ‘GEL’. The semantics of EL, and that of its extension GEL, is defined modeltheoretically: for each construction of the language, a recursive rule describes how its value can be computed from the values of its constituents. Since GEL talks not only about semantic objects and their relations but also about syntactic concepts, GEL models are nonstandard in containing both kinds of entities. Whereas phrasestructure rules are traditionally viewed procedurally, as recipes for building phrases, and a rule in the parsingasdeduction is viewed declaratively, as a proposition which is true when the conditions for building the phrase are satisfied, a rule in GEL is best viewed as a proposition in Dynamic Semantics: it can be evaluated recursively, and evaluates not to true or false, but to the minimal change in the model, needed to make the proposition true. The viability of this idea has been demonstrated by a proofofconcept implementation for DPSG chart parsing and an emulation of HPSG parsing in the STUF environment.
pdf
bib
abs
Compiling Typed AttributeValue Logic Grammars
Bob Carpenter
The unificationbased approach to processing attributevalue logic grammars, similar to Prolog interpretation, has become the standard. We propose an alternative, embodied in the AttributeLogic Engine (ALE) (Carpenter 1993) , based on the Warren Abstract Machine (WAM) approach to compiling Prolog (AïtKaci 1991). Phrase structure grammars with procedural attachments, similar to Definite Clause Grammars (DCG) (Pereira — Warren 1980), are specified using a typed version of RoundsKasper logic (Carpenter 1992). We argue for the benefits of a strong and total version of typing in terms of both clarity and efficiency. Finally, we discuss the compilation of grammars into a few efficient lowlevel instructions for the basic feature structure operations.
pdf
bib
abs
(Pictorial) LR Parsing from an Arbitrary Starting Point
Gennaro Costagliola
In pictorial LR parsing it is always difficult to establish from which point of a picture the parsing process has to start. This paper introduces an algorithm that allows any element of the input to be considered as the starting one and, at the same time, assures that the parsing process is not compromised. The algorithm is first described on string grammars seen as a subclass of pictorial grammars and then adapted to the twodimensional case. The extensions to generalized LR parsing and pictorial generalized LR parsing are immediate.
pdf
bib
abs
A New Transformation into Deterministically Parsable Form for Natural Language Grammars
Nigel R. Ellis

Roberto Garigliano

Richard G. Morgan
Marcus demonstrated that it was possible to construct a deterministic grammar/interpreter for a subset of natural language [Marcus, 1980]. Although his work with PARSIFAL pioneered the field of deterministic natural language parsing, his method has several drawbacks: • The rules and actions in the grammar / interpreter are so embedded that it is difficult to distinguish between them. • The grammar / interpreter is very difficult to construct (the small grammar shown in [Marcus, 1980] took about four months to construct). • The grammar is very difficult to maintain, as a small change may have several side effects. This paper outlines a set of structure transformations for converting a nondeterministic grammar into deterministic form. The original grammar is written in a context free form; this is then transformed to resolve ambiguities.
pdf
bib
abs
A Principlebased Parser for Foreign Language Training in German and Arabic
Joe Garman

Jeffery Martin

Paola Merlo

Amy Weinberg
In this paper we discuss the design and implementation of a parser for German and Arabic, which is currently being used in a tutoring system for foreign language training. Computeraided language tutoring is a good application for testing the robustness and flexibility of a parsing system, since the input is usually ungrammatical in some way. Efficiency is also a concern, as tutoring applications typically run on personal computers, with the parser sharing memory with other components of the system. Our system is principlebased, which ensures a compact representation, and improves portability, needed in order to extend the initial design from German to Arabic and (eventually) Spanish. Currently, the parser diagnoses agreement errors, case errors, selection errors, and some word order errors. The parser can handle simple and complex declaratives and questions, topicalisations, verb movement, relative clauses — broad enough coverage to be useful in the design of real exercises and dialogues.
pdf
bib
abs
An Algorithm for the Construction of Dependency Trees
Gerrit F. van der Hoeven
A casting system is a dictionary which contains information about words, and relations that can exist between words in sentences. A casting system allows the construction of dependency trees for sentences. They are trees which have words in roles at their nodes, and arcs which correspond to dependency relations. The trees are related to dependency trees in classical dependency syntax, but they are not the same. Formally, casting systems define a family of languages which is a proper subset of the contextfree languages. It is richer than the family of regular languages however. The interest in casting systems arose from an experiment in which it was investigated whether a dictionary of words and wordrelations created by a group of experts on the basis of the analysis of a corpus of titles of scientific publications, would suffice to automatically produce reasonable but maybe superficial syntactical analyses of such titles. The results of the experiment were encouraging, but not clear enough to draw firm conclusions. A technical question which arose during the experiment, concerns the choice of a proper algorithm to construct the forest of dependency trees for a given sentence. It turns out that Earley’s wellknown algorithm for the parsing of contextfree languages can be adapted to construct dependency trees on the basis of a casting system. The adaptation is of cubic complexity. In fact one can show that contextfree grammars and dictionaries of words and wordrelations like casting systems, both belong to a more general family of systems, which associate trees with sequences of tokens. Earley’s algorithm cannot just be adapted to work for casting systems, but it can be generalized to work for the entire large family.
pdf
bib
abs
Integration of Morphological and Syntactic Analysis Based on LR Parsing Algorithm
Tanaka Hozumi

Tokunaga Takenobu

Aizawa Michio
Morphological analysis of Japanese is very different from that of English, because no spaces are placed between words. The analysis includes segmentation of words. However, ambiguities in segmentation is not always resolved only with morphological information. This paper proposes a method to integrate the morphological and syntactic analysis based on LR parsing algorithm. An LR table derived from grammar rules is modified on the basis of connectabilities between two adjacent words. The modified LR table reflects both the morphological and syntactic constraints. Using the LR table, efficient morphological and syntactic analysis is available.
pdf
bib
abs
Structural Disambiguation in Japanese by Evaluating Case Structures based on Examples in a Case Frame Dictionary
Sadao Kurohashi

Makoto Nagao
A case structure expression is one of the most important forms to represent the meaning of a sentence. Case structure analysis is usually performed by consulting case frame information in verb dictionaries and by selecting a proper case frame for an input sentence. However, this analysis is very difficult because of word sense ambiguity and structural ambiguity. A conventional method for solving these problems is to use the method of selectional restriction, but this method has a drawback in the semantic marker (SM) system – the tradeoff between descriptive power and construction cost. This paper describes a method of case structure analysis of Japanese sentences which overcomes the drawback in the SM system, concentrating on the structural disambiguation. This method selects a proper case frame for an input by the similarity measure between the input and typical example sentences of each case frame. When there are two or more possible readings for an input because of structural ambiguity, the best reading will be selected by evaluating case structures in each possible reading by the similarity measure with typical example sentences of case frames.
pdf
bib
abs
GLR* – An Efficient Noiseskipping Parsing Algorithm For Context Free Grammars
Alon Lavie

Masaru Tomita
This paper describes GLR*, a parser that can parse any input sentence by ignoring unrecognizable parts of the sentence. In case the standard parsing procedure fails to parse an input sentence, the parser nondeterministically skips some word(s) in the sentence, and returns the parse with fewest skipped words. Therefore, the parser will return some parse(s) with any input sentence, unless no part of the sentence can be recognized at all. The problem can be defined in the following way: Given a contextfree grammar G and a sentence S, find and parse S' – the largest subset of words of S, such that S' ∈ L(G). The algorithm described in this paper is a modification of the Generalized LR (Tomita) parsing algorithm [Tomita, 1986] . The parser accommodates the skipping of words by allowing shift operations to be performed from inactive state nodes of the Graph Structured Stack. A heuristic similar to beam search makes the algorithm computationally tractable. There have been several other approaches to the problem of robust parsing, most of which are special purpose algorithms [Carbonell and Hayes, 1984] , [Ward, 1991] and others. Because our approach is a modification to a standard contextfree parsing algorithm, all the techniques and grammars developed for the standard parser can be applied as they are. Also, in case the input sentence is by itself grammatical, our parser behaves exactly as the standard GLR parser. The modified parser, GLR*, has been implemented and integrated with the latest version of the Generalized LR Parser/Compiler [Tomita et al , 1988], [Tomita, 1990]. We discuss an application of the GLR* parser to spontaneous speech understanding and present some preliminary tests on the utility of the GLR* parser in such settings.
pdf
bib
abs
The Use of Bunch Notation in Parsing Theory
René Leermakers
Much of mathematics, and therefore much of computer science, is built on the notion of sets. In this paper it is argued that in parsing theory it is sometimes convenient to replace sets by a related notion, bunches. The replacement is not so much a matter of principle, but helps to create a more concise theory. Advantages of the bunch concept are illustrated by using it in descriptions of a formal semantics for contextfree grammars and of functional parsing algorithms.
pdf
bib
abs
Chart Parsing of Attributed StructureSharing Flowgraphs with TiePoint Relationships
Rudi Lutz
Many applications make use of diagrams to represent complex objects. In such applications it is often necessary to recognise how some diagram has been pieced together from other diagrams. Examples are electrical circuit analysis, and program understanding in the plan calculus (Rich, 1981). In these applications the recognition process can be formalised as flowgraph parsing, where a flowgraph is a special case of a plex (Feder 1971) . Nodes in a flowgraph are connected to each other via intermediate points known as tiepoints. Lutz (1986, 1989) generalised chart parsing of contextfree string languages (Thompson – Ritchie, 1984) to contextfree flowgraph languages, enabling bottomup and topdown recognition of flowgraphs. However, there are various features of the plan calculus that complicate this  in particular attributes, structure sharing, and relationships between tiepoints. This paper will present a chart parsing algorithm for analysing graphs with all these features, suitable for both program understanding and digital circuit analysis. For a fixed grammar, this algorithm runs in time polynomial in the number of tiepoints in the input graph.
pdf
bib
abs
The Interplay of Syntactic and Semantic Node Labels in Partial Parsing
David D. McDonald
Our natural language comprehension system, “Sparser” , uses a semantic grammar in conjunction with a domain model that defines the categories and alreadyknown individuals that can be expected in the sublanguages we are studying, the most significant of which to date has been articles from the Wall Street Journal’s “Who’s News” column. In this paper we describe the systematic use of default syntactic rules in this grammar: an alternative set of labels on consitutents that are used to capture generalities in the semantic interpretation of constructions like the verbal auxiliaries or many adverbials. Syntactic rules form the basis of a set of schemas in a Tree Adjoining Grammar that are used as templates from which to create the primary, semantically labeled rules of the grammar as part of defining the categories in the domain models. This design permits the semantic grammar to be developed on a linguistically principled basis since all the rules must conform to syntactically sound patterns.
pdf
bib
abs
Increasing the Applicability of LR Parsing
MarkJan Nederhof

Janos J. Sarbo
In this paper we describe a phenomenon present in some contextfree grammars, called hidden left recursion. We show that ordinary LR parsing according to hidden leftrecursive grammars is not possible and we indicate a range of solutions to this problem. One of these solutions is a new parsing technique, which is a variant of traditional LR parsing. This new parsing technique can be used both with and without lookahead and the nondeterminism can be realized using backtracking or using a graphstructured stack.
pdf
bib
abs
Reducing Complexity in A Systemic Parser
Michael O’Donnell
Parsing with a large systemic grammar brings one facetoface with the problem of unification with disjunctive descriptions. This paper outlines some techniques which we employed in a systemic parser to reduce the averagecase complexity of such unification.
pdf
bib
abs
Generalized LR parsing and attribute evaluation
Paul Oude Luttighuis

Klaas Sikkel
This paper presents a thorough discussion of generalized LR parsing with simultaneous attribute evaluation. Nondeterministic parsers and combined parser/evaluators are presented for the LL(0) , LR(0) , and SKLR(0) strategies. SKLR(0) parsing occurs as an intermediate strategy between the first two. Particularly in the context of simultaneous attribute evaluation, generalized SKLR(0) parsing is a sensible alternative for generalized LR(0) parsing.
pdf
bib
abs
A ProofTheoretic Reconstruction of HPSG
Stephan Raaijmakers
A reinterpretation of HeadDriven Phrase Structure Grammar (HPSG) in a prooftheoretic context is presented. This approach yields a decision procedure which can be used to establish whether certain strings are generated by a given HPSG grammar. It is possible to view HPSG as a fragment of linear logic (Girard, 1987), subject to partiality and side conditions on inference rules. This relates HPSG to several categorial logics (Morrill, 1990) . Specifically, HPSG signs are mapped onto quantified formulae, which can be interpreted as secondorder types given the CurryHoward isomorphism. The logic behind type inference will, aside from the usual quantifier introduction and elimination rules, consist of a partial logic for the undirected implication connective. It will be shown how this logical perspective can be turned into a parsing perspective. The enterprise takes the standard HPSG of Pollard – Sag (1987) as a starting point, since this version of HPSG is welldocumented and has been around long enough to have displayed both merits and shortcomings; the approach is directly applicable to more recent versions of HPSG, however. In order to make the prooftheoretic recasting smooth, standard HPSG is reformulated in a binary format.
pdf
bib
abs
Stochastic Lexicalized ContextFree Grammar
Yves Schabes

Richard C. Waters
Stochastic lexicalized contextfree grammar (SLCFG) is an attractive compromise between the parsing efficiency of stochastic contextfree grammar (SCFG) and the lexical sensitivity of stochastic lexicalized treeadjoining grammar (SLTAG) . SLCFG is a restricted form of SLTAG that can only generate contextfree languages and can be parsed in cubic time. However, SLCFG retains the lexical sensitivity of SLTAG and is therefore a much better basis for capturing distributional information about words than SCFG.
pdf
bib
abs
Predictive HeadCorner Chart Parsing
Klaas Sikkel

Rieks op den Akker
HeadCorner (HC) parsing has come up in computational linguistics a few years ago, motivated by linguistic arguments. This idea is a heuristic, rather than a failsafe principle, hence it is relevant indeed to consider the worstcase behaviour of the HC parser. We define a novel predictive headcorner chart parser of cubic time complexity. We start with a leftcorner (LC) chart parser, which is easier to understand. Subsequently, the LC chart parser is generalized to an HC chart parser. It is briefly sketched how the parser can be enhanced with feature structures.
pdf
bib
abs
Parsing English with a Link Grammar
Daniel D. Sleator

Davy Temperley
We define a new formal grammatical system called a link grammar. A sequence of words is in the language of a link grammar if there is a way to draw links between words in such a way that (1) the local requirements of each word are satisfied, (2) the links do not cross, and (3) the words form a connected graph. We have encoded English grammar into such a system, and written a program (based on new algorithms) for efficiently parsing with a link grammar. The formalism is lexical and makes no explicit use of constituents and categories. The breadth of English phenomena that our system handles is quite large. A number of sophisticated and new techniques were used to allow efficient parsing of this very complex grammar. Our program is written in C, and the entire system may be obtained via anonymous ftp. Several other researchers have begun to use link grammars in their own research.
pdf
bib
abs
Evaluation of TTP Parser: A Preliminary Report
Tomek Strzalkowski

Peter G. N. Scheyen
TTP (Tagged Text Parser) is a fast and robust natural language parser specifically designed to process vast quantities of unrestricted text. TTP can analyze written text at the speed of approximately 0.3 sec/sentence, or 73 words per second. An important novel feature of TTP parser is that it is equipped with a skipandfit recovery mechanism that allows for fast closing of more difficult subconstituents after a preset amount of time has elapsed without producing a parse. Although a complete analysis is attempted for each sentence, the parser may occasionally ignore fragments of input to resume “normal” processing after skipping a few words. These fragments are later analyzed separately and attached as incomplete constituents to the main parse tree. TTP has recently been evaluated against several leading parsers. While no formal numbers were released (a formal evaluation is planned later this year), TTP has performed surprisingly well. The main argument of this paper is that TTP can provide a substantial gain in parsing speed giving up relatively little in terms of the quality of output it produces. This property allows TTP to be used effectively in parsing large volumes of text.
pdf
bib
abs
Frequency Estimation of Verb Subcategorization Frames Based on Syntactic and Multidimensional Statistical Analysis
Akira Ushioda

David A. Evans

Ted Gibson

Alex Waibel
We describe a mechanism for automatically estimating frequencies of verb subcategorization frames in a large corpus. A tagged corpus is first partially parsed to identify noun phrases and then a regular grammar is used to estimate the appropriate subcategorization frame for each verb token in the corpus. In an experiment involving the identification of six fixed subcategorization frames, our current system showed more than 80% accuracy. In addition, a new statistical method enables the system to learn patterns of errors based on a set of training samples and substantially improves the accuracy of the frequency estimation.
pdf
bib
abs
Handling Syntactic ExtraGrammaticality
Fuliang Weng
This paper reviews and summarizes six different types of extragrammatical phenomena and their corresponding recovery principles at the syntactic level, and describes some techniques used to deal with four of them completely within an Extended GLR parser (EGLR). Partial solutions to the remaining two by the EGLR parser are also discussed. The EGLR has been implemented.
pdf
bib
abs
Adventures in Multidimensional Parsing: Cycles and Disorders
Kent Wittenburg
Among the proposals for multidimensional grammars is a family of constraintbased grammatical frameworks, including Relational Grammars. In Relational languages, expressions are formally defined as a set of relations whose tuples are taken from an indexed set of symbols. Both bottomup parsing and Earleystyle parsing algorithms have previously been proposed for different classes of Relational languages. The Relational language class for Earley style parsing in Wittenburg (1992a) requires that each relation be a partial order. However, in some realworld domains, the relations do not naturally conform to these restrictions. In this paper I discuss motivations and methods for predictive, Earleystyle parsing of multidimensional languages when the relations involved do not necessarily yield an ordering, e.g., when the relations are symmetric and/or nontransitive. The solution involves guaranteeing that a single initial start position for parsing can be associated with any member of the input set. The domains in which these issues are discussed involve incremental parsing in interfaces and offline verification of multidimensional data.
pdf
bib
abs
Probabilistic Incremental Parsing in Systemic Functional Grammar
A. Ruvan Weerasinghe

Robin P. Fawcett
In this paper we suggest that a key feature to look for in a successful parser is its ability to lend itself naturally to semantic interpretation. We therefore argue in favour of a parser based on a semantically oriented model of grammar, demonstrating some of the benefits that such a model offers to the parsing process. In particular we adopt a systemic functional syntax as the basis for implementing a chart based probabilistic incremental parser for a nontrivial subset of English.