Nick Ruiz
2012
MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation
Nick Ruiz
|
Marcello Federico
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoothed unigram ratio between an adaptation text and the background language model to scale only the n-gram probabilities corresponding to translation options gathered by the SMT decoder. The effects of the unigram ratio are scaled by adding an additional feature weight to the log-linear discriminative model. We present results on the IWSLT 2012 TED talk translation task and show that Lazy MDI provides comparable language model adaptation performance to classic MDI.
2011
Fill-up versus interpolation methods for phrase-based SMT adaptation
Arianna Bisazza
|
Nick Ruiz
|
Marcello Federico
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task – the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.
Topic Adaptation for Lecture Translation through Bilingual Latent Semantic Models
Nick Ruiz
|
Marcello Federico
Proceedings of the Sixth Workshop on Statistical Machine Translation