Daniel J. Walker
2005
The Open A.I. Kit: General Machine Learning Modules from Statistical Machine Translation
Daniel J. Walker
Workshop on open-source machine translation
The Open A.I. Kit implements the major components of Statistical Machine Translation as an accessible, extendable Software Development Kit with broad applicability beyond the field of Machine Translation. The high-level system design policies of the kit embrace the Open Source development model to provide a modular architecture and interface, which may serve as a basis for collaborative research and development for endeavors in Artificial Intelligence.
2001
Sentence boundary detection: a comparison of paradigms for improving MT quality
Daniel J. Walker
|
David E. Clements
|
Maki Darwin
|
Jan W. Amtrup
Proceedings of Machine Translation Summit VIII
The reliable detection of sentence boundaries in running text is one of the first important steps in preparing an input document for translation. Although this is often neglected, it is necessary to obtain a translation with a high degree of quality. In this paper, we present a comparison of different paradigms for the detection of sentence boundaries in written text. We compare three different approaches: Directly encoding the knowledge in a program, a rule-based system relying on regular expressions to describe boundaries, and a statistical maximum-entropy learning algorithm to obtain knowledge about boundaries. Using the statistical system, we obtain a recall of 98.14%, classifying boundaries of six types, and using a training corpus of under 10,000 sentences.