Wider Pipelines: N-Best Alignments and Parses in MT Training

Ashish Venugopal, Andreas Zollmann, Noah A. Smith, Stephan Vogel


Abstract
State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these “single-best” hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOP-inspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.
Anthology ID:
2008.amta-papers.18
Volume:
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 21-25
Year:
2008
Address:
Waikiki, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
192–201
Language:
URL:
https://aclanthology.org/2008.amta-papers.18
DOI:
Bibkey:
Cite (ACL):
Ashish Venugopal, Andreas Zollmann, Noah A. Smith, and Stephan Vogel. 2008. Wider Pipelines: N-Best Alignments and Parses in MT Training. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 192–201, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Wider Pipelines: N-Best Alignments and Parses in MT Training (Venugopal et al., AMTA 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.amta-papers.18.pdf