The NICT translation system for IWSLT 2011

Andrew Finch, Chooi-Ling Goh, Graham Neubig, Eiichiro Sumita

This paper describes NICT’s participation in the IWSLT 2011 evaluation campaign for the TED speech translation ChineseEnglish shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented in two ways. Firstly we introduced rule-based re-ordering constraints on the decoding. This consisted of a set of rules that were used to segment the input utterances into segments that could be decoded almost independently. This idea here being that constraining the decoding process in this manner would greatly reduce the search space of the decoder, and cut out many possibilities for error while at the same time allowing for a correct output to be generated. The rules we used exploit punctuation and spacing in the input utterances, and we use these positions to delimit our segments. Not all punctuation/spacing positions were used as segment boundaries, and the set of used positions were determined by a set of linguistically-based heuristics. Secondly we used two heterogeneous methods to build the translation model, and lexical reordering model for our systems. The first method employed the popular method of using GIZA++ for alignment in combination with phraseextraction heuristics. The second method used a recentlydeveloped Bayesian alignment technique that is able to perform both phrase-to-phrase alignment and phrase pair extraction within a single unsupervised process. The models produced by this type of alignment technique are typically very compact whilst at the same time maintaining a high level of translation quality. We evaluated both of these methods of translation model construction in isolation, and our results show their performance is comparable. We also integrated both models by linear interpolation to obtain a model that outperforms either component. Finally, we added an indicator feature into the log-linear model to indicate those phrases that were in the intersection of the two translation models. The addition of this feature was also able to provide a small improvement in performance.
Anthology ID:
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
December 8-9
San Francisco, California
Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker
Cite (ACL):
Andrew Finch, Chooi-Ling Goh, Graham Neubig, and Eiichiro Sumita. 2011. The NICT translation system for IWSLT 2011. In Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 49–56, San Francisco, California.
Cite (Informal):
The NICT translation system for IWSLT 2011 (Finch et al., IWSLT 2011)
Copy Citation: