Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation

Kenji Imamura; Eiichiro Sumita

Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation

Abstract

Domain adaptation is a major challenge when applying machine translation to practical tasks. In this paper, we present domain adaptation methods for machine translation that assume multiple domains. The proposed methods combine two model types: a corpus-concatenated model covering multiple domains and single-domain models that are accurate but sparse in specific domains. We combine the advantages of both models using feature augmentation for domain adaptation in machine learning. Our experimental results show that the BLEU scores of the proposed method clearly surpass those of single-domain models for low-resource domains. For high-resource domains, the scores of the proposed method were superior to those of both single-domain and corpusconcatenated models. Even in domains having a million bilingual sentences, the translation quality was at least preserved and even improved in some domains. These results demonstrate that state-of-the-art domain adaptation can be realized with appropriate settings, even when using standard log-linear models.

Anthology ID:: 2016.amta-researchers.7
Volume:: Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track
Month:: October 28 - November 1
Year:: 2016
Address:: Austin, TX, USA
Editors:: Spence Green, Lane Schwartz
Venue:: AMTA
SIG:
Publisher:: The Association for Machine Translation in the Americas
Note:
Pages:: 79–92
Language:
URL:: https://aclanthology.org/2016.amta-researchers.7/
DOI:
Bibkey:
Cite (ACL):: Kenji Imamura and Eiichiro Sumita. 2016. Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation. In Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track, pages 79–92, Austin, TX, USA. The Association for Machine Translation in the Americas.
Cite (Informal):: Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation (Imamura & Sumita, AMTA 2016)
Copy Citation:
PDF:: https://aclanthology.org/2016.amta-researchers.7.pdf

PDF Cite Search Fix data