Monte Carlo Parsing

Rens Bod


Abstract
In stochastic language processing, we are often interested in the most probable parse of an input string. Since there can be exponentially many parses, comparing all of them is not efficient. The Viterbi algorithm (Viterbi, 1967; Fujisaki et al., 1989) provides a tool to calculate in cubic time the most probable derivation of a string generated by a stochastic context free grammar. However, in stochastic language models that allow a parse tree to be generated by different derivations – like Data Oriented Parsing (DOP) or Stochastic Lexicalized Tree-Adjoining Grammar (SLTAG) – the most probable derivation does not necessarily produce the most probable parse. In such cases, a Viterbi-style optimisation does not seem feasible to calculate the most probable parse. In the present article we show that by incorporating Monte Carlo techniques into a polynomial time parsing algorithm, the maximum probability parse can be estimated as accurately as desired in polynomial time. Monte Carlo parsing is not only relevant to DOP or SLTAG, but also provides for stochastic CFGs an interesting alternative to Viterbi. Unlike the current versions of Viterbi style optimisation (Fujisaki et al., 1989; Jelinek et al., 1990; Wright et al., 1991), Monte Carlo parsing is not restricted to CFGs in Chomsky Normal Form. For stochastic grammars that are parsable in cubic time, the time complexity of estimating the most probable parse with Monte Carlo turns out to be O(n2𝜀-2), where n is the length of the input string and 𝜀 the estimation error. In this paper we will treat Monte Carlo parsing first of all in the context of the DOP model, since it is especially here that the number of derivations generating a single tree becomes dramatically large. Finally, a Monte Carlo Chart parser is used to test the DOP model on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments indicate 96% test set parsing accuracy.
Anthology ID:
1993.iwpt-1.2
Volume:
Proceedings of the Third International Workshop on Parsing Technologies
Month:
August 10-13
Year:
1993
Address:
Tilburg, Netherlands and Durbuy, Belgium
Editors:
Harry Bunt, Robert Berwick, Ken Church, Aravind Joshi, Ronald Kaplan, Martin Kay, Bernard Lang, Makoto Nagao, Anton Nijholt, Mark Steedman, Henry Thompson, Masaru Tomita, K. Vijay-Shanker, Yorick Wilks, Kent Wittenburg
Venue:
IWPT
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–12
Language:
URL:
https://aclanthology.org/1993.iwpt-1.2
DOI:
Bibkey:
Cite (ACL):
Rens Bod. 1993. Monte Carlo Parsing. In Proceedings of the Third International Workshop on Parsing Technologies, pages 1–12, Tilburg, Netherlands and Durbuy, Belgium. Association for Computational Linguistics.
Cite (Informal):
Monte Carlo Parsing (Bod, IWPT 1993)
Copy Citation:
PDF:
https://aclanthology.org/1993.iwpt-1.2.pdf