Input Normalization for an English-to-Chinese SMS Translation System
Aw AiTi | Zhang Min | Yeo PohKhim | Fan ZhenZhen | Su Jian
Proceedings of Machine Translation Summit X: Posters
This paper describes an approach to preprocess SMS text for Machine Translation. As SMS text behaves differently from normal written text and to reduce the tremendous effort required to customize or adapt the language model of the traditional translation system to handle SMS text style, normalization is performed to moderate the irregularities in English SMS text using a noisy channel model. A mapping model is used to model the three major problems in SMS text. They are (1) substitution of word using non-standard acronym, (2) insertion of flavour word, and (3) omission of auxiliary verb and subject pronoun. Experiment results show that with normalization before translation, the rejection rate of our English-to-Chinese SMS translation for broadcasting purpose is reduced by 15.5%. We believe that the performance of normalization can be further improved with deeper linguistic processing.