Masaru Tomita (Editor)
When dealing with a phenomenon as vast and com plex as natural language, an experimental approach is often the best way to discover new computational methods and determine their usefulness. The experimental process includes designing and selecting new experiments, carrying out the experiments, and evaluating the experiments. Most conference presentations are about finished experiments, completed theoretical results, or the evaluation of systems already in use. In this workshop setting, I would like to depart from this tendency to discuss some experiments that we are beginning to perform, and the reasons for investigating a particular approach to parsing. This approach builds on recent work in unification-based parsing and classification-based knowledge representation, developing an architecture that brings together the capabilities of these related frameworks.
This paper addresses the issue of how to organize linguistic principles for efficient processing. Based on the general characterization of principles in terms of purely computational properties, the effects of principle-ordering on parser performance are investigated. A novel parser that exploits the possible variation in principle-ordering to dynamically re-order principles is described. Heuristics for minimizing the amount of unnecessary work performed during the parsing process are also discussed.
In a natural language processing system, a large amount of ambiguity and a large branching factor are hindering factors in obtaining the desired analysis for a given sentence in a short time. In this paper, we are proposing a sequential truncation parsing algorithm to reduce the searching space and thus lowering the parsing time. The algorithm is based on a score function which takes the advantages of probabilistic characteristics of syntactic information in the sentences. A preliminary test on this algorithm was conducted with a special version of our machine translation system, the ARCHTRAN, and an encouraging result was observed.
An LR parser for probabilistic context-free grammars is described. Each of the standard versions of parser generator (SLR, canonical and LALR) may be applied. A graph-structured stack permits action conflicts and allows the parser to be used with uncertain input, typical of speech recognition applications. The sentence uncertainty is measured using entropy and is significantly lower for the grammar than for a first-order Markov model.
This paper describes a speech parsing method called HMM-LR. In HMM-LR, an LR parsing table is used to predict phones in speech input, and the system drives an HMM-based speech recognizer directly without any intervening structures such as a phone lattice. Very accurate, efficient speech parsing is achieved through the integrated processes of speech recognition and language analysis. The HMM-LR m ethod is applied to large-vocabulary speaker-dependent Japanese phrase recognition. The recognition rate is 87.1% for the top candidates and 97.7% for the five best candidates.
An analysis method for Japanese spoken sentences based on HPSG has been developed. Any analysis module for the interpreting telephony task requires the following capabilities: (i) the module must be able to treat spoken-style sentences; and, (ii) the module must be able to take, as its input, lattice-like structures which include both correct and incorrect constituent candidates of a speech recognition module. To satisfy these requirements, an analysis method has been developed, which consists of a grammar designed for treating spoken-style Japanese sentences and a parser designed for taking as its input speech recognition output lattices. The analysis module based on this method is used as part of the NADINE (Natural Dialogue Interpretation Expert) system and the SL-TRANS (Spoken Language Translation) system.
Authentic text as found in corpora cannot be described completely by a formal system, such as a set of grammar rules. As robust parsing is a prerequisite for any practical natural language processing system, there is certainly a need for techniques that go beyond merely formal approaches. Various possibilities, such as the use of simulated annealing, have been proposed recently and we have looked at their suitability for the parse process of the DLT machine translation system, which will use a large structured bilingual corpus as its main linguistic knowledge source. Our findings are that parsing is not the type of task that should be tackled solely through simulated annealing or similar stochastic optimization techniques but that a controlled application of probabilistic methods is essential for the performance of a corpus-based parser. On the basis of our explorative research we have planned a number of small-scale implementations in the near future.
Extensions to Categorial Grammars proposed to account for nonconstitutent conjunction and long-distance dependencies introduce the problem of equivalent derivations, an issue we have characterized as spurious ambiguity from the parsing perspective. In Wittenburg (1987) a proposal was made for compiling Categorial Grammars into predictive forms in order to solve the spurious ambiguity problem. This paper investigates formal properties o f grammars that use predictive versions of function composition. Among our results are (1) that grammars with predictive composition are in general equivalent to the originals if and only if a restriction on predictive rules is applied, (2) that modulo this restriction, the predictive grammars have indeed eliminated the problem of spurious ambiguity, and (3) that the issue o f equivalence is decidable, i.e., for any particular grammar, whether one needs to apply the restriction or not to ensure equivalence is a decidable question.
In this paper, we show that some non-cyclic context-free grammars with 𝜀-rules cannot be handled by Tomita’s algorithm properly. We describe a modified version of the algorithm which remedies the problem.
A variation on Tomita’s algorithm is analyzed in regards to its time and space complexity. It is shown to have a general time bound of 0(n ̃𝜌+1), where n is the length of the input string and 𝜌 is the length of the longest production. A modified algorithm is presented in which the time bound is reduced to 0(n3). The space complexity of Tomita’s algorithm is shown to be proportional to n2 in the worst case and is changed by at most a constant factor with the modification. Empirical results are used to illustrate the trade off between time and space on a simple example. A discussion of two subclasses of context-free grammars that can be recognized in 0(n2) and O(n) is also included.
A new natural language system, TINA, has been developed for applications involving spoken language tasks, which integrate key ideas from context free grammars, Augmented Transition Networks (ATN’s) , and Lexical Functional Grammars (LFG’s) . The parser uses a best-first strategy, with probability assignments on all arcs obtained automatically from a set of example sentences. An initial context-free grammar, derived from the example sentences, is first converted to a probabilistic network structure. Control includes both top-down and bottom-up cycles, and key parameters are passed among nodes to deal with long-distance movement, agreement, and semantic constraints. The probabilities provide a natural mechanism for exploring more common grammatical constructions first. One novel feature of TINA is that it provides an atuomatic sentence generation capability, which has been very effective for identifying overgeneration problems. A fully integrated spoken language system using this parser is under development.
We describe a connectionist model which learns to parse single sentences from sequential word input. A parse in the connectionist network contains information about role assignment, prepositional attachment, relative clause structure, and subordinate clause structure. The trained network displays several interesting types of behavior. These include predictive ability, tolerance to certain corruptions of input word sequences, and some generalization capability. We report on experiments in which a small number of sentence types have been successfully learned by a network. Work is in progress on a larger database. Application of this type of connectionist model to the area of spoken language processing is discussed.
This paper describes the parsing scheme in the 𝛷DmDialog speech-to-speech dialog translation system, with special emphasis on the integration of speech and natural language processing. We propose an integrated architecture for parsing speech inputs based on a parallel marker-passing scheme and attaining dynamic participation of knowledge from the phonological-level to the discourse-level. At the phonological level, we employ a stochastic model using a transition matrix and a confusion matrix and markers which carry a probability measure. At a higher level, syntactic/semantic and discourse processing, we integrate a case-based and constraint-based scheme in a consistent manner so that a priori probability and constraints, which reflect linguistic and discourse factors, are provided to the phonological level of processing. A probability/cost-based scheme in our model enables ambiguity resolution at various levels using one uniform principle.
We present a concise survey of approaches to the context-free parsing problem of natural languages in parallel environments. The discussion includes parsing schemes which use more than one traditional parser, schemes where separate processes are assigned to the ‘non-deterministic’ choices during parsing, schemes where the number of processes depends on the length of the sentence being parsed, and schemes where the number of processes depends on the grammar size rather than on the input length. In addition we discuss a connectionist approach to the parsing problem.
This paper describes the conversion of a set of feature grammar rules into a deterministic finite state machine that accepts the same language (or at least a well-defined related language). First the reasoning behind why this is an interesting thing to do within the Edinburgh speech recogniser project, is discussed. Then details about the compilation algorithm are given. Finally, there is some discussion of the advantages and disadvantages of this method of implementing feature based grammar formalisms.
An efficient algorithm that enumerates parses of ambiguous context-free languages is described, and its time and space complexities are discussed. When context-free parsers are used for natural language parsing, pattern recognition, and so forth, there may be a great number of parses for a sentence. One common strategy for efficient enumeration of parses is to assign an appropriate weight to each production, and to enumerate parses in the order of the total weight of all applied production. However, the existing algorithms taking this strategy can be applied only to the problems of limited areas such as regular languages; in the other areas only inefficient exhaustive searches are known. In this paper, we first introduce a hierarchical graph suitable for enumeration. Using this graph, enumeration of parses in the order of acceptablity is equivalent to finding paths of this graph in the order of length. Then, we present an efficient enumeration algorithm with this graph, which can be applied to arbitrary context-free grammars. For enumeration of k parses in the order of the total weight of all applied productions, the time and space complexities of our algorithm are 0(n3 + kn2) and 0(n3 + kn), respectively.
The Parallel Expert Parser (PEP) is a natural language analysis model belonging to the interactive model paradigm that stresses the parallel interaction of relatively small distributed knowledge components to arrive at the meaning of a fragment of text. It borrows the idea of words as basic dynamic entities triggering a set of interactive processes from the Word Expert Parser (Small 1980), but tries to improve on the clarity of interactive processes and on the organization of lexically-distributed knowledge. As of now, especially the procedural aspects have received attention: instead of having wild-running uncontrollable interactions, PEP restricts the interactions to explicit communications on a structured blackboard; the communication protocols are a compromise betwenn maximum parallelism and controllability. At the same time, it is no longer just words that trigger processes; words create larger units (constituents), that are in turn interacting entities on a higher level. Lexical experts contribute their associated knowledge, create higher-level experts, and die away. The linguists define the levels to be considered, and write expert processes in a language that tries to hide the procedural aspects of the parallel-interactive model from them. Problems include the possiblity of deadlock situations when processes wait infinitely for each other, the way to efficiently pursue different alternatives (as of now, the system just uses don’t-care determinism), and testing whether the protocols allow linguists to fully express their needs. PEP has been implemented in Flat Concurrent Prolog, using the Logix programming environment. Current research is oriented more towards the problem of distributed knowledge representation. Abstractions and generalizations across lexical experts could be made using principles from object-oriented programming (introducing generic, prototypical experts; cp. Hahn 1987). Thoughts also go in the direction of an integration of the coarse-grained parallelism with knowledge representation in a fine-grained parallel (connectionist) way.
A generalized LR parsing algorithm, which has been developed by Tomita [Tomita 86], can treat a context free grammar. His algorithm makes use of breadth first strategy when a conflict occcurs in a LR parsing table. It is well known that the breadth first strategy is suitable for parallel processing. This paper presents an algorithm of a parallel parsing system (PLR) based on a generalized LR parsing. PLR is implemented in GHC [Ueda 85] that is a concurrent logic programming language developed by Japanese 5th generation computer project. The feature of PLR is as follows: Each entry of a LR parsing table is regarded as a process which handles shift and reduce operations. If a process discovers a conflict in a LR parsing table, it activates subprocesses which conduct shift and reduce operations. These subprocesses run in parallel and simulate breadth first strategy. There is no need to make some subprocesses synchronize during parsing. Stack information is sent to each subprocesses from their parent process. A simple experiment for parsing a sentence revealed the fact that PLR runs faster than PAX [Matsumoto 87][Matsumoto 89] that has been known as the best parallel parser.
In this paper, we investigate the processing of the so-called ‘lexicalized’ grammar. In ‘lexicalized’ grammars (Schabes, Abeille and Joshi, 1988), each elementary structure is systema tically associated with a lexical ‘head’. These structures specify extended domains of locality (as compared to CFGs) over which constraints can be stated. The ‘grammar’ consists of a lexicon where each lexical item is associated with a finite number of structures for which that item is the ‘head’ . There are no separate grammar rules. There are, of course, ‘rules’ which tell us how these structures are combined. A general two-pass parsing strategy for ‘lexicalized’ grammars follows naturally. In the first stage, the parser selects a set of elementary structures associated with the lexical items in the input sentence, and in the second stage the sentence is parsed with respect to this set. We evaluate this strategy with respect to two characteristics. First, the amount of filtering on the entire grammar is evaluated: once the first pass is performed, the parser uses only a subset of the grammar. Second, we evaluate the use of non-local information: the structures selected during the first pass encode the morphological value (and therefore the position in the string) of their ‘head’; this enables the parser to use non-local in form ation to guide its search. We take Lexicalized Tree Adjoining Grammars as an in stance of lexicalized grammar. We illustrate the organization of the grammar. Then we show how a general Earley-type TAG parser (Schabes and Joshi, 1988) can take advantage of lexicalization. Empirical data show that the filtering of the grammar and the non-local in formation provided by the two-pass strategy improve the performance of the parser. We explain how constraints over the elementary structures expressed by unification equations can be parsed by a simple extension of the Earley-type TAG parser. Lexicalization guarantees termination of the algorithm without special devices such as restrictors.
This paper describes a parsing system used in a framework for the development of Natural Language grammars. It is an interactive environment suitable for writing robust NL applications generally. Its heart is the SAIL parsing algorithm that uses a Phrase-Structure Grammar with extensive augmentations. Furthermore, some particular parsing tools are embedded in the system, and provide a powerful environment for developing grammars, even of large coverage.
In a natural language processing system designed for language learners, it is necessary to accept both well-formed and ill-formed input. This paper describes a method of maintaining parsing efficiency for well-formed sentences while still accepting a wide range of ill-formed input.
The Unification-based Grammars seem to be adequate for the analysis of agglutinative languages such as Korean, etc. In this paper, the merits of Lexical Functional Grammar is analyzed and the structure of Korean Syntactic Analyzer is described. Verbal complex category is used for the analysis of several linguistic phenomena and a new attribute of UNKNOWN is defined for the analysis of grammatical relations.
This paper describes two methods for the acquisition and utilization of lexical cooccurrence relationships. Under these method, cooccurrence relationships are obtained from two kinds of inputs: example sentences and the corresponding correct syntactic structure. The first of the two methods treats a set of governors each element of which is bound to a element of sister nodes set in a syntactic structure under consideration, as a cooccurrence relationship. In the second method, a cooccurrence relationship name and affiliated attribute names are manually given in the description of augmented rewriting rules. Both methods discriminate correctness of cooccurrence by the use of the correct syntactic structure mentioned above. Experiment is made for both methods to find if thus obtained cooccurrence relationship is useful for the correct analysis.
There are a number of collocational constraints in natural languages that ought to play a more important role in natural language parsers. Thus, for example, it is hard for most parsers to take advantage of the fact that wine is typically drunk, produced, and sold, but (probably) not pruned. So too, it is hard for a parser to know which verbs go with which prepositions (e.g., set up) and which nouns fit together to form compound noun phrases (e.g., computer programmer). This paper will attempt to show that many of these types of concerns can be addressed with syntactic methods (symbol pushing), and need not require explicit semantic interpretation. We have found that it is possible to identify many of these interesting co-occurrence relations by computing simple summary statistics over millions of words of text. This paper will summarize a number of experiments carried out by various subsets of the authors over the last few years. The term collocation will be used quite broadly to include constraints on SVO (subject verb object) triples, phrasal verbs, compound noun phrases, and psychoiinguistic notions of word association (e.g., doctor/nurse).
The search for efficient parsing strategies has a long history, dating back to at least the Cocke/Younger/Kusami parser of the early sixties. The publication of the Earley parser in 1970 has had a significant influence on context-free (CF) parsing for natural language processing, evidenced by the interest in the variety of chart parsers implemented since then. The development of unification grammars (with their complex feature structures) has put new life into the discussion of efficient parsing strategies, and there has been some debate on the use of essentially bottom-up or top-down strategies, the efficacy of top-down filtering and so on. The approacn to parsing described here is suitable for complex category, unification-based grammars. The concentration here is on a unification grammar which has a context-free backbone, Lexical-Functional Grammer (LFG). The parser is designed primarily for simplicity, efficiency and practical application. The parser outlined here results in a high-level, but still efficient, language system without making a requirement on the grammar/lexicon writer to understand its implementation details. The parsing algorithm operates in a systematic bottom-up (BU) fashion, thus taking earliest advantage of LFQ’s concentration of information in the lexicon and also making use of unrestricted feature structures to realize LFG’s Top-Down (TD) predictive potential. While LFG can make special use of its CF backbone, the algorithm employed is not restricted to grammars having a CF backbone and is equally suited to complex-feature-based formalisms. Additionally, the algorithm described (which is a systematic left-to-right (left comer) parsing algorithm) allows us to take full advantage of both BU and TD aspects of a unificatin-based grammar without incurring prohibitive overheads such as feature-structure comparison or subsumption checking. The use of TD prediction, which in the Earley algorithm is allowed to hypothesize new parse paths, is here restricted to confirming initial parses produced BU, and specializing these according to future (feature) expectations.
PREMO is a knowledge-based Preference Semantics parser with access to a large, lexical semantic knowledge base and organized along the lines of an operating system. The state of every partial parse is captured in a structure called a language object, and the control structure of the preference machine is a priority queue of these language objects. The language object at the front of the queue has the highest score as computed by a preference metric that weighs grammatical predictions, semantic type matching, and pragmatic coherence. The highest priority language object is the intermediate reading that is currently most preferred (the others are still “alive,” but not actively pursued); in this way the preference machine avoids combinatorial explosion by following a “best-first” strategy for parsing. The system has clear extensions into parallel processing.
2-Dimensional Context-Free Grammar (2D-CFG) for 2-dimensional input text is introduced and efficient parsing algorithms for 2D-CFG are presented. In 2D-CFG, a grammar rule’s right hand side symbols can be placed not only horizontally but also vertically. Terminal symbols in a 2-dimensional input text are combined to form a rectangular region, and regions are combined to form a larger region using a 2-dimensional phrase structure rule. The parsing algorithms presented in this paper are the 2D-Ear1ey algorithm and 2D-LR algorithm, which are 2-dimensionally extended versions of Earley’s algorithm and the LR(O) algorithm, respectively.
A parser is described here based on the Cocke-Young-Kassami algorithm which uses immediate dominance and linear precedence rules together with various feature inheritance conventions. The meta rules in the grammar are not applied beforehand but only when needed. This ensures that the rule set is kept to a minimum. At the same time, determining what rule to expand by applying which meta-rule is done in an efficient manner using the meta-rule reference table. Since this table is generated during “compilation” stage, its generation does not add to parsing time.