Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers

Pratyush Banerjee; Jinhua Du; Baoli Li; Sudip Kumar Naskar; Andy Way; Josef van Genabith

Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers

Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Naskar, Andy Way, Josef van Genabith

Abstract

This paper presents a set of experiments on Domain Adaptation of Statistical Machine Translation systems. The experiments focus on Chinese-English and two domain-specific corpora. The paper presents a novel approach for combining multiple domain-trained translation models to achieve improved translation quality for both domain-specific as well as combined sets of sentences. We train a statistical classifier to classify sentences according to the appropriate domain and utilize the corresponding domain-specific MT models to translate them. Experimental results show that the method achieves a statistically significant absolute improvement of 1.58 BLEU (2.86% relative improvement) score over a translation model trained on combined data, and considerable improvements over a model using multiple decoding paths of the Moses decoder, for the combined domain test set. Furthermore, even for domain-specific test sets, our approach works almost as well as dedicated domain-specific models and perfect classification.

Anthology ID:: 2010.amta-papers.16
Volume:: Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:: October 31-November 4
Year:: 2010
Address:: Denver, Colorado, USA
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:: https://aclanthology.org/2010.amta-papers.16/
DOI:
Bibkey:
Cite (ACL):: Pratyush Banerjee, Jinhua Du, Baoli Li, Sudip Naskar, Andy Way, and Josef van Genabith. 2010. Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Combining Multi-Domain Statistical Machine Translation Models using Automatic Classifiers (Banerjee et al., AMTA 2010)
Copy Citation:
PDF:: https://aclanthology.org/2010.amta-papers.16.pdf

PDF Cite Search Fix data