Takanori Kusumoto
2012
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension
Takanori Kusumoto
|
Tomoyosi Akiba
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Statistical machine translation (SMT) requires a parallel corpus between the source and target languages. Although a pivot-translation approach can be applied to a language pair that does not have a parallel corpus directly between them, it requires both source―pivot and pivot―target parallel corpora. We propose a novel approach to apply SMT to a resource-limited source language that has no parallel corpus but has only a word dictionary for the pivot language. The problems with dictionary-based translations lie in their ambiguity and incompleteness. The proposed method uses a word lattice representation of the pivot-language candidates and word lattice decoding to deal with the ambiguity; the lattice expansion is accomplished by using a pivot―target phrase translation table to compensate for the incompleteness. Our experimental evaluation showed that this approach is promising for applying SMT, even when a source-side parallel corpus is lacking.