David Miller


2006

pdf bib
Context-Based Machine Translation
Jaime Carbonell | Steve Klein | David Miller | Mike Steinbaum | Tomer Grassiany | Jochen Frei
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Context-Based Machine TranslationTM (CBMT) is a new paradigm for corpus-based translation that requires no parallel text. Instead, CBMT relies on a lightweight translation model utilizing a fullform bilingual dictionary and a sophisticated decoder using long-range context via long n-grams and cascaded overlapping. The translation process is enhanced via in-language substitution of tokens and phrases, both for source and target, when top candidates cannot be confirmed or resolved in decoding. Substitution utilizes a synonym and near-synonym generator implemented as a corpus-based unsupervised learning process. Decoding requires a very large target-language-only corpus, and while substitution in target can be performed using that same corpus, substitution in source requires a separate (and smaller) source monolingual corpus. Spanish-to-English CBMT was tested on Spanish newswire text, achieving a BLEU score of 0.6462 in June 2006, the highest BLEU reported for any language pair. Further testing also shows that quality increases above the reported score as the target corpus size increases and as dictionary coverage of source words and phrases becomes more complete.

2004

pdf bib
Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
Alvin Martin | David Miller | Mark Przybocki | Joseph Campbell | Hirotaka Nakasone
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
Christopher Cieri | David Miller | Kevin Walker
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
Christopher Cieri | Joseph P. Campbell | Hirotaka Nakasone | David Miller | Kevin Walker
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Fluent Machines’ EliMT system
Eli Abir | Steve Klein | David Miller | Michael Steinbaum
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: System Descriptions

This paper presents a generalized description of the characteristics and implications of two processes that enable Fluent Machines’ machine translation system, called EliMT (a term coined by Dr. Jamie Carbonell after the system’s inventor, Eli Abir). These two processes are (1) an automated cross-language database builder and (2) an n-gram connector.

2000

pdf bib
Named Entity Extraction from Noisy Input: Speech and OCR
David Miller | Sean Boisen | Richard Schwartz | Rebecca Stone | Ralph Weischedel
Sixth Applied Natural Language Processing Conference