The previous English-Korean MT system that have been developed in Korea have dealt with only written text as translation object. Most of them enumerated a following list of the problems that had not seemed to be easy to solve in the near future : 1) processing of non-continuous idiomatic expressions 2) reduction of too many POS or structural ambiguities 3) robust processing for long sentence and parsing failure 4) selecting correct word correspondence between several alternatives. The problems can be considered as important factors that have influence on the translation quality of machine translation system. This paper describes not only the solutions of problems of the previous English-to-Korean machine translation systems but also the HTML tags management between two structurally different languages, English and Korean. Through the solutions we translate successfully English web documents into Korean one in the English-to-Korean web translator "FromTo/Web-EK" which has been developed from 1997.
From To K/E: a Korean-English machine translation system based on idiom recognition and fail softening
Byong-Rae Ryu | Youngkil Kim | Sanghwa Yuh | Sangkyu Park
Proceedings of Machine Translation Summit VII
In this paper we describe and experimentally evaluate FromTo K/E, a rule-based Korean-English machine translation system adapting transfer methodology. In accordance with the view that a successful Korean-English machine translation system presumes a highly efficient robust Korean parser, we develop a parser reinforced with "Fail Softening", i.e. the long sentence segmentation and the recovery of failed parse trees. To overcome the language-typological differences between Korean and English, we adopt a powerful module for processing Korean multi-word lexemes and Korean idiomatic expressions. Prior to parsing Korean sentences, furthermore, we try to resolve the ambiguity of words with unknown grammatical functions on the basis of the collocation and subcategorization information. The results of the experimental evaluation show that the degree of understandability for sample 2000 sentences amounts to 2.67, indicating that the meaning of the translated English sentences is almost clear to users, but the sentences still include minor grammatical or stylistic errors up to max. 30% of the whole words.