Monte Carlo Parsing
Rens
Bod
author
1993-aug 10-13
text
Proceedings of the Third International Workshop on Parsing Technologies
Harry
Bunt
editor
Robert
Berwick
editor
Ken
Church
editor
Aravind
Joshi
editor
Ronald
Kaplan
editor
Martin
Kay
editor
Bernard
Lang
editor
Makoto
Nagao
editor
Anton
Nijholt
editor
Mark
Steedman
editor
Henry
Thompson
editor
Masaru
Tomita
editor
K
Vijay-Shanker
editor
Yorick
Wilks
editor
Kent
Wittenburg
editor
Association for Computational Linguistics
Tilburg, Netherlands and Durbuy, Belgium
conference publication
In stochastic language processing, we are often interested in the most probable parse of an input string. Since there can be exponentially many parses, comparing all of them is not efficient. The Viterbi algorithm (Viterbi, 1967; Fujisaki et al., 1989) provides a tool to calculate in cubic time the most probable derivation of a string generated by a stochastic context free grammar. However, in stochastic language models that allow a parse tree to be generated by different derivations – like Data Oriented Parsing (DOP) or Stochastic Lexicalized Tree-Adjoining Grammar (SLTAG) – the most probable derivation does not necessarily produce the most probable parse. In such cases, a Viterbi-style optimisation does not seem feasible to calculate the most probable parse. In the present article we show that by incorporating Monte Carlo techniques into a polynomial time parsing algorithm, the maximum probability parse can be estimated as accurately as desired in polynomial time. Monte Carlo parsing is not only relevant to DOP or SLTAG, but also provides for stochastic CFGs an interesting alternative to Viterbi. Unlike the current versions of Viterbi style optimisation (Fujisaki et al., 1989; Jelinek et al., 1990; Wright et al., 1991), Monte Carlo parsing is not restricted to CFGs in Chomsky Normal Form. For stochastic grammars that are parsable in cubic time, the time complexity of estimating the most probable parse with Monte Carlo turns out to be O(n²ǎrepsilon⁻2), where n is the length of the input string and ǎrepsilon the estimation error. In this paper we will treat Monte Carlo parsing first of all in the context of the DOP model, since it is especially here that the number of derivations generating a single tree becomes dramatically large. Finally, a Monte Carlo Chart parser is used to test the DOP model on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments indicate 96% test set parsing accuracy.
bod-1993-monte
https://aclanthology.org/1993.iwpt-1.2
1993-aug 10-13
1
12