Left language model state for syntactic machine translation

Kenneth Heafield, Hieu Hoang, Philipp Koehn, Tetsuo Kiso, Marcello Federico


Abstract
Many syntactic machine translation decoders, including Moses, cdec, and Joshua, implement bottom-up dynamic programming to integrate N-gram language model probabilities into hypothesis scoring. These decoders concatenate hypotheses according to grammar rules, yielding larger hypotheses and eventually complete translations. When hypotheses are concatenated, the language model score is adjusted to account for boundary-crossing n-grams. Words on the boundary of each hypothesis are encoded in state, consisting of left state (the first few words) and right state (the last few words). We speed concatenation by encoding left state using data structure pointers in lieu of vocabulary indices and by avoiding unnecessary queries. To increase the decoder’s opportunities to recombine hypothesis, we minimize the number of words encoded by left state. This has the effect of reducing search errors made by the decoder. The resulting gain in model score is smaller than for right state minimization, which we explain by observing a relationship between state minimization and language model probability. With a fixed cube pruning pop limit, we show a 3-6% reduction in CPU time and improved model scores. Reducing the pop limit to the point where model scores tie the baseline yields a net 11% reduction in CPU time.
Anthology ID:
2011.iwslt-evaluation.24
Volume:
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
Month:
December 8-9
Year:
2011
Address:
San Francisco, California
Editors:
Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
183–190
Language:
URL:
https://aclanthology.org/2011.iwslt-evaluation.24
DOI:
Bibkey:
Cite (ACL):
Kenneth Heafield, Hieu Hoang, Philipp Koehn, Tetsuo Kiso, and Marcello Federico. 2011. Left language model state for syntactic machine translation. In Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 183–190, San Francisco, California.
Cite (Informal):
Left language model state for syntactic machine translation (Heafield et al., IWSLT 2011)
Copy Citation:
PDF:
https://aclanthology.org/2011.iwslt-evaluation.24.pdf