Boosting Statistical Word Alignment

Hua Wu, Haifeng Wang


Abstract
This paper proposes an approach to improve statistical word alignment with the boosting method. Applying boosting to word alignment must solve two problems. The first is how to build the reference set for the training data. We propose an approach to automatically build a pseudo reference set, which can avoid manual annotation of the training set. The second is how to calculate the error rate of each individual word aligner. We solve this by calculating the error rate of a manually annotated held-out data set instead of the entire training set. In addition, the final ensemble takes into account the weights of the alignment links produced by the individual word aligners. Experimental results indicate that the boosting method proposed in this paper performs much better than the original word aligner, achieving a large error rate reduction.
Anthology ID:
2005.mtsummit-papers.41
Volume:
Proceedings of Machine Translation Summit X: Papers
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
313–320
Language:
URL:
https://aclanthology.org/2005.mtsummit-papers.41
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2005.mtsummit-papers.41.pdf