%0 Conference Proceedings
%T Monte Carlo Parsing
%A Bod, Rens
%Y Bunt, Harry
%Y Berwick, Robert
%Y Church, Ken
%Y Joshi, Aravind
%Y Kaplan, Ronald
%Y Kay, Martin
%Y Lang, Bernard
%Y Nagao, Makoto
%Y Nijholt, Anton
%Y Steedman, Mark
%Y Thompson, Henry
%Y Tomita, Masaru
%Y Vijay-Shanker, K.
%Y Wilks, Yorick
%Y Wittenburg, Kent
%S Proceedings of the Third International Workshop on Parsing Technologies
%D 1993
%8 aug 10 13
%I Association for Computational Linguistics
%C Tilburg, Netherlands and Durbuy, Belgium
%F bod-1993-monte
%X In stochastic language processing, we are often interested in the most probable parse of an input string. Since there can be exponentially many parses, comparing all of them is not efficient. The Viterbi algorithm (Viterbi, 1967; Fujisaki et al., 1989) provides a tool to calculate in cubic time the most probable derivation of a string generated by a stochastic context free grammar. However, in stochastic language models that allow a parse tree to be generated by different derivations – like Data Oriented Parsing (DOP) or Stochastic Lexicalized Tree-Adjoining Grammar (SLTAG) – the most probable derivation does not necessarily produce the most probable parse. In such cases, a Viterbi-style optimisation does not seem feasible to calculate the most probable parse. In the present article we show that by incorporating Monte Carlo techniques into a polynomial time parsing algorithm, the maximum probability parse can be estimated as accurately as desired in polynomial time. Monte Carlo parsing is not only relevant to DOP or SLTAG, but also provides for stochastic CFGs an interesting alternative to Viterbi. Unlike the current versions of Viterbi style optimisation (Fujisaki et al., 1989; Jelinek et al., 1990; Wright et al., 1991), Monte Carlo parsing is not restricted to CFGs in Chomsky Normal Form. For stochastic grammars that are parsable in cubic time, the time complexity of estimating the most probable parse with Monte Carlo turns out to be O(n²ǎrepsilon⁻2), where n is the length of the input string and ǎrepsilon the estimation error. In this paper we will treat Monte Carlo parsing first of all in the context of the DOP model, since it is especially here that the number of derivations generating a single tree becomes dramatically large. Finally, a Monte Carlo Chart parser is used to test the DOP model on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments indicate 96% test set parsing accuracy.
%U https://aclanthology.org/1993.iwpt-1.2
%P 1-12