Ron Zacharski


2009

2006

In this paper we describe a set of processes for the acquisition of re­sources for quick ramp­up machine translation (MT) from any language lacking significant machine tracta­ble resources into English, using the Paraguayan indigenous lan­guage Guarani as well as Amharic and Chechen, as examples. Our task is to develop a 250,000 mono­lingual corpus, a 250,000 bilingual parallel corpus, and smaller corpora tagged with part of speech, named entity, and morphological annota­tions.

2000

1999

In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the results of an experimental study that showed that the performance of this algorithm has practical value.

1992

1988