The TÜBİTAK-UEKAE statistical machine translation system for IWSLT 2009

Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan


Abstract
We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses linguistic morphological analysis and statistical disambiguation to generate morpheme-based translation models, which is the approach with the better translation performance. One of the contrastive submissions utilizes unsupervised subword segmentation to generate non-linguistic subword-based translation models, while another contrastive system uses word-based models but makes use of lexical approximation to cope with out-of-vocabulary words, similar to the approach in our Arabic-to-English submission.
Anthology ID:
2009.iwslt-evaluation.17
Volume:
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
Month:
December 1-2
Year:
2009
Address:
Tokyo, Japan
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
113–117
Language:
URL:
https://aclanthology.org/2009.iwslt-evaluation.17
DOI:
Bibkey:
Cite (ACL):
Coşkun Mermer, Hamza Kaya, and Mehmet Uğur Doğan. 2009. The TÜBİTAK-UEKAE statistical machine translation system for IWSLT 2009. In Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 113–117, Tokyo, Japan.
Cite (Informal):
The TÜBİTAK-UEKAE statistical machine translation system for IWSLT 2009 (Mermer et al., IWSLT 2009)
Copy Citation:
PDF:
https://aclanthology.org/2009.iwslt-evaluation.17.pdf
Presentation:
 2009.iwslt-evaluation.17.Presentation.pdf