Input Normalization for an English-to-Chinese SMS Translation System

Aw AiTi, Zhang Min, Yeo PohKhim, Fan ZhenZhen, Su Jian


Abstract
This paper describes an approach to preprocess SMS text for Machine Translation. As SMS text behaves differently from normal written text and to reduce the tremendous effort required to customize or adapt the language model of the traditional translation system to handle SMS text style, normalization is performed to moderate the irregularities in English SMS text using a noisy channel model. A mapping model is used to model the three major problems in SMS text. They are (1) substitution of word using non-standard acronym, (2) insertion of flavour word, and (3) omission of auxiliary verb and subject pronoun. Experiment results show that with normalization before translation, the rejection rate of our English-to-Chinese SMS translation for broadcasting purpose is reduced by 15.5%. We believe that the performance of normalization can be further improved with deeper linguistic processing.
Anthology ID:
2005.mtsummit-posters.18
Volume:
Proceedings of Machine Translation Summit X: Posters
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
445–450
Language:
URL:
https://aclanthology.org/2005.mtsummit-posters.18
DOI:
Bibkey:
Cite (ACL):
Aw AiTi, Zhang Min, Yeo PohKhim, Fan ZhenZhen, and Su Jian. 2005. Input Normalization for an English-to-Chinese SMS Translation System. In Proceedings of Machine Translation Summit X: Posters, pages 445–450, Phuket, Thailand.
Cite (Informal):
Input Normalization for an English-to-Chinese SMS Translation System (AiTi et al., MTSummit 2005)
Copy Citation:
PDF:
https://aclanthology.org/2005.mtsummit-posters.18.pdf