ArchTran: A Corpus-based Statistics-oriented English-Chinese Machine Translation System
Shu-Chuan Chen | Jing-Shin Chang | Jong-Nae Wang | and Keh-Yih Su
Proceedings of Machine Translation Summit III: Papers
The ArchTran English-Chinese Machine Translation System is among the first commercialized English-Chinese machine translation systems in the world. A prototype system was released in 1989 and currently serves as the kernel of a value-added network-based translation service. The main design features of the ArchTran system are the adoption of a mixed (bottom-up parsing with top-down filtering) parsing strategy, a scored parsing mechanism, and the corpus-based, statistics-oriented paradigm for linguistic knowledge acquisition. Under this framework, research directions are toward designing systematic and automatic methods for acquiring language model parameters, and toward using preference measure with uniform probabilistic score function for ambiguity resolution. In this paper, the underlying probabilistic models of the ArchTran designing philosophy will be presented.
In a natural language processing system, a large amount of ambiguity and a large branching factor are hindering factors in obtaining the desired analysis for a given sentence in a short time. In this paper, we are proposing a sequential truncation parsing algorithm to reduce the searching space and thus lowering the parsing time. The algorithm is based on a score function which takes the advantages of probabilistic characteristics of syntactic information in the sentences. A preliminary test on this algorithm was conducted with a special version of our machine translation system, the ARCHTRAN, and an encouraging result was observed.