Exploiting Document-Level Context for Data-Driven Machine Translation

Ralf Brown


Abstract
This paper presents a method for exploiting document-level similarity between the documents in the training corpus for a corpus-driven (statistical or example-based) machine translation system and the input documents it must translate. The method is simple to implement, efficient (increases the translation time of an example-based system by only a few percent), and robust (still works even when the actual document boundaries in the input text are not known). Experiments on French-English and Arabic-English showed relative gains over the same system without using document-level similarity of up to 7.4% and 5.4%, respectively, on the BLEU metric.
Anthology ID:
2008.amta-papers.2
Volume:
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 21-25
Year:
2008
Address:
Waikiki, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
46–55
Language:
URL:
https://aclanthology.org/2008.amta-papers.2
DOI:
Bibkey:
Cite (ACL):
Ralf Brown. 2008. Exploiting Document-Level Context for Data-Driven Machine Translation. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 46–55, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Exploiting Document-Level Context for Data-Driven Machine Translation (Brown, AMTA 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.amta-papers.2.pdf