David M. Rojas

Also published as: David Rojas


pdf bib
Predicting MT Quality as a Function of the Source Language
David M. Rojas | Takako Aikawa
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes one phase of a large-scale machine translation (MT) quality assurance project. We explore a novel approach to discriminating MT-unsuitable source sentences by predicting the expected quality of the output. The resources required include a set of source/MT sentence pairs, human judgments on the output, a source parser, and an MT system. We extract a number of syntactic, semantic, and lexical features from the source sentences only and train a classifier that we call the “Syntactic, Semantic, and Lexical Model” (SSLM) (cf. Gamon et al., 2005; Liu & Gildea, 2005; Rajman & Hartley, 2001). Despite the simplicity of the approach, SSLM scores correlate with human judgments and can help determine whether sentences are suitable or unsuitable for translation by our MT system. SSLM also provides information about which source features impact MT quality, connecting this work with the field of controlled language (CL) (cf. Reuther, 2003; Nyberg & Mitamura, 1996). With a focus on the input side of MT, SSLM differs greatly from evaluation approaches such as BLEU (Papineni et al., 2002), NIST (Doddington, 2002) and METEOR (Banerjee & Lavie, 2005) in that these other systems compare MT output with reference sentences for evaluation and do not provide feedback regarding potentially problematic source material. Our method bridges the research areas of CL and MT evaluation by addressing the importance of providing “MT-suitable” English input to enhance output quality.


pdf bib
Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization
Eric Ringger | Michael Gamon | Robert C. Moore | David Rojas | Martine Smets | Simon Corston-Oliver
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics