Machine Translation Using Overlapping Alignments and SampleRank

Benjamin Roth, Andrew McCallum, Marc Dymetman, Nicola Cancedda


Abstract
We present a conditional-random-field approach to discriminatively-trained phrase-based machine translation in which training and decoding are both cast in a sampling framework and are implemented uniformly in a new probabilistic programming language for factor graphs. In traditional phrase-based translation, decoding infers both a "Viterbi" alignment and the target sentence. In contrast, in our approach, a rich overlapping-phrase alignment is produced by a fast deterministic method, while probabilistic decoding infers only the target sentence, which is then able to leverage arbitrary features of the entire source sentence, target sentence and alignment. By using SampleRank for learning we could in principle efficiently estimate hundreds of thousands of parameters. Test-time decoding is done by MCMC sampling with annealing. To demonstrate the potential of our approach we show preliminary experiments leveraging alignments that may contain overlapping bi-phrases.
Anthology ID:
2010.amta-papers.31
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-papers.31
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2010.amta-papers.31.pdf