Wider Pipelines: N-Best Alignments and Parses in MT Training

Ashish Venugopal; Andreas Zollmann; Noah A. Smith; Stephan Vogel

Wider Pipelines: N-Best Alignments and Parses in MT Training

Ashish Venugopal, Andreas Zollmann, Noah A. Smith, Stephan Vogel

Abstract

State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these “single-best” hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOP-inspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.

Anthology ID:: 2008.amta-papers.18
Volume:: Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:: October 21-25
Year:: 2008
Address:: Waikiki, USA
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 192–201
Language:
URL:: https://aclanthology.org/2008.amta-papers.18
DOI:
Bibkey:
Cite (ACL):: Ashish Venugopal, Andreas Zollmann, Noah A. Smith, and Stephan Vogel. 2008. Wider Pipelines: N-Best Alignments and Parses in MT Training. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 192–201, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Wider Pipelines: N-Best Alignments and Parses in MT Training (Venugopal et al., AMTA 2008)
Copy Citation:
PDF:: https://aclanthology.org/2008.amta-papers.18.pdf

PDF Cite Search