Yves Schabes


1996

1995

1994

1993

Stochastic lexicalized context-free grammar (SLCFG) is an attractive compromise between the parsing efficiency of stochastic context-free grammar (SCFG) and the lexical sensitivity of stochastic lexicalized tree-adjoining grammar (SLTAG) . SLCFG is a restricted form of SLTAG that can only generate context-free languages and can be parsed in cubic time. However, SLCFG retains the lexical sensitivity of SLTAG and is therefore a much better basis for capturing distributional information about words than SCFG.

1992

1991

The valid prefix property (VPP), the capability of a left to right parser to detect errors as soon as possible, often goes unnoticed in parsing CFGs. Earley’s parser for CFGs (Earley, 1968; Earley, 1970) maintains the valid prefix property and obtains an O(n3)-time worst case complexity, as good as parsers that do not maintain such as the CKY parser (Younger, 1967; Kasami, 1965). Contrary to CFGs, maintaining the valid prefix property for TAGs is costly. In 1988, Schabes and Joshi proposed an Earley-type parser for TAGs. It maintains the valid prefix property at the expense of its worst case complexity (O(n9)-time). To our knowledge, it is the only known polynomial time parser for TAGs that maintains the valid prefix property. In this paper, we explain why the valid prefix property is expensive to maintain for TAGs and we introduce a predictive left to right parser for TAGs that does not maintain the valid prefix property but that achieves an O(n6)-time worst case behavior, O(n4)-time for unambiguous grammars and linear time for a large class of grammars.

1990

1989

In this paper, we investigate the processing of the so-called ‘lexicalized’ grammar. In ‘lexicalized’ grammars (Schabes, Abeille and Joshi, 1988), each elementary structure is systema tically associated with a lexical ‘head’. These structures specify extended domains of locality (as compared to CFGs) over which constraints can be stated. The ‘grammar’ consists of a lexicon where each lexical item is associated with a finite number of structures for which that item is the ‘head’ . There are no separate grammar rules. There are, of course, ‘rules’ which tell us how these structures are combined. A general two-pass parsing strategy for ‘lexicalized’ grammars follows naturally. In the first stage, the parser selects a set of elementary structures associated with the lexical items in the input sentence, and in the second stage the sentence is parsed with respect to this set. We evaluate this strategy with respect to two characteristics. First, the amount of filtering on the entire grammar is evaluated: once the first pass is performed, the parser uses only a subset of the grammar. Second, we evaluate the use of non-local information: the structures selected during the first pass encode the morphological value (and therefore the position in the string) of their ‘head’; this enables the parser to use non-local in form ation to guide its search. We take Lexicalized Tree Adjoining Grammars as an in stance of lexicalized grammar. We illustrate the organization of the grammar. Then we show how a general Earley-type TAG parser (Schabes and Joshi, 1988) can take advantage of lexicalization. Empirical data show that the filtering of the grammar and the non-local in formation provided by the two-pass strategy improve the performance of the parser. We explain how constraints over the elementary structures expressed by unification equations can be parsed by a simple extension of the Earley-type TAG parser. Lexicalization guarantees termination of the algorithm without special devices such as restrictors.

1988