Strategies in Developing Engine-specific Chinese-English User Parallel Corpora
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Government MT User Program
This paper proposes some strategies and techniques for creating phrase-level user parallel corpora for Systran translation engine. Though not all strategies and techniques discussed here will apply to other translation engines, the concept will.
It is common practice that linguists will do MT post-editing to improve translation accuracy and fluency. This presentation however, examines the importance of pre-editing source material to improve MT. Even when a digital source file which is literally correct is used for MT, there are still some factors that have significant effect on MT translation accuracy and fluency. Based on 35 examples from more than 20 professional journals and websites, this article is about an experiment of pre-editing source material for Chinese-English MT in the S and T domain. Pertinent examples are selected to illustrate how machine translation accuracy and fluency can be enhanced by pre-editing which includes the following four areas: to provide a straightforward sentence structure, to improve punctuation, to use straightforward wording, and to eliminate redundancy and superfluous elements.