Improving statistical machine translation by paraphrasing the training data.

Francis Bond, Eric Nichols, Darren Scott Appling, Michael Paul


Abstract
Large amounts of training data are essential for training statistical machine translations systems. In this paper we show how training data can be expanded by paraphrasing one side. The new data is made by parsing then generating using a precise HPSG based grammar, which gives sentences with the same meaning, but minor variations in lexical choice and word order. In experiments with Japanese and English, we showed consistent gains on the Tanaka Corpus with less consistent improvement on the IWSLT 2005 evaluation data.
Anthology ID:
2008.iwslt-papers.2
Volume:
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers
Month:
October 20-21
Year:
2008
Address:
Waikiki, Hawaii
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
150–157
Language:
URL:
https://aclanthology.org/2008.iwslt-papers.2
DOI:
Bibkey:
Cite (ACL):
Francis Bond, Eric Nichols, Darren Scott Appling, and Michael Paul. 2008. Improving statistical machine translation by paraphrasing the training data.. In Proceedings of the 5th International Workshop on Spoken Language Translation: Papers, pages 150–157, Waikiki, Hawaii.
Cite (Informal):
Improving statistical machine translation by paraphrasing the training data. (Bond et al., IWSLT 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.iwslt-papers.2.pdf