The CASIA statistical machine translation system for IWSLT 2009

Maoxi Li, Jiajun Zhang, Yu Zhou, Chengqing Zong


Abstract
This paper reports on the participation of CASIA (Institute of Automation Chinese Academy of Sciences) at the evaluation campaign of the International Workshop on Spoken Language Translation 2009. We participated in the challenge tasks for Chinese-to-English and English-to-Chinese translation respectively and the BTEC task for Chinese-to-English translation only. For all of the tasks, system performance is improved with some special methods as follows: 1) combining different results of Chinese word segmentation, 2) combining different results of word alignments, 3) adding reliable bilingual words with high probabilities to the training data, 4) handling named entities including person names, location names, organization names, temporal and numerical expressions additionally, 5) combining and selecting translations from the outputs of multiple translation engines, 6) replacing Chinese character with Chinese Pinyin to train the translation model for Chinese-to-English ASR challenge task. This is a new approach that has never been introduced before.
Anthology ID:
2009.iwslt-evaluation.13
Volume:
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
Month:
December 1-2
Year:
2009
Address:
Tokyo, Japan
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
83–90
Language:
URL:
https://aclanthology.org/2009.iwslt-evaluation.13
DOI:
Bibkey:
Cite (ACL):
Maoxi Li, Jiajun Zhang, Yu Zhou, and Chengqing Zong. 2009. The CASIA statistical machine translation system for IWSLT 2009. In Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 83–90, Tokyo, Japan.
Cite (Informal):
The CASIA statistical machine translation system for IWSLT 2009 (Li et al., IWSLT 2009)
Copy Citation:
PDF:
https://aclanthology.org/2009.iwslt-evaluation.13.pdf
Presentation:
 2009.iwslt-evaluation.13.Presentation.pdf