International Conference on Spoken Language Translation (2011)


bib (full) Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Overview of the IWSLT 2011 evaluation campaign
Marcello Federico | Luisa Bentivogli | Michael Paul | Sebastian Stüker

We report here on the eighth Evaluation Campaign organized by the IWSLT workshop. This year, the IWSLT evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 Evaluation Campaign, which includes: descriptions of the supplied data and evaluation specifications of each track, the list of participants specifying their submitted runs, a detailed description of the subjective evaluation carried out, the main findings of each exercise drawn from the results and the system descriptions prepared by the participants, and, finally, several detailed tables reporting all the evaluation results.

pdf bib
The NICT ASR system for IWSLT2011
Kazuhiko Abe | Youzheng Wu | Chien-lin Huang | Paul R. Dixon | Shigeki Matsuda | Chiori Hori | Hideki Kashioka

In this paper, we describe NICT’s participation in the IWSLT 2011 evaluation campaign for the ASR Track. To recognize spontaneous speech, we prepared an acoustic model trained by more spontaneous speech corpora and a language model constructed with text corpora distributed by the organizer. We built the multi-pass ASR system by adapting the acoustic and language models with previous ASR results. The target speech was selected from talks on the TED (Technology, Entertainment, Design) program. Here, a large reduction in word error rate was obtained by the speaker adaptation of the acoustic model with MLLR. Additional improvement was achieved not only by adaptation of the language model but also by parallel usage of the baseline and speaker-dependent acoustic models. Accordingly, the final WER was reduced by 30% from the baseline ASR for the distributed test set.

pdf bib
The MIT-LL/AFRL IWSLT-2011 MT system
A. Ryan Aminzadeh | Tim Anderson | Ray Slyh | Brian Ore | Eric Hansen | Wade Shen | Jennifer Drexler | Terry Gleason

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2011 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk translation tasks. We also applied our existing ASR system to the TED-talk lecture ASR task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2010 system, and experiments we ran during the IWSLT-2011 evaluation. Specifically, we focus on 1) speech recognition for lecture-like data, 2) cross-domain translation using MAP adaptation, and 3) improved Arabic morphology for MT preprocessing.

pdf bib
The DCU machine translation systems for IWSLT 2011
Pratyush Banerjee | Hala Almaghout | Sudip Naskar | Johann Roturier | Jie Jiang | Andy Way | Josef van Genabith

In this paper, we provide a description of the Dublin City University’s (DCU) submissions in the IWSLT 2011 evaluationcampaign.1 WeparticipatedintheArabic-Englishand Chinese-English Machine Translation(MT) track translation tasks. We use phrase-based statistical machine translation (PBSMT) models to create the baseline system. Due to the open-domain nature of the data to be translated, we use domain adaptation techniques to improve the quality of translation. Furthermore, we explore target-side syntactic augmentation for an Hierarchical Phrase-Based (HPB) SMT model. Combinatory Categorial Grammar (CCG) is used to extract labels for target-side phrases and non-terminals in the HPB system. Combining the domain adapted language models with the CCG-augmented HPB system gave us the best translations for both language pairs providing statistically significant improvements of 6.09 absolute BLEU points (25.94% relative) and 1.69 absolute BLEU points (15.89% relative) over the unadapted PBSMT baselines for the Arabic-English and Chinese-English language pairs, respectively.

pdf bib
The NICT translation system for IWSLT 2011
Andrew Finch | Chooi-Ling Goh | Graham Neubig | Eiichiro Sumita

This paper describes NICT’s participation in the IWSLT 2011 evaluation campaign for the TED speech translation ChineseEnglish shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented in two ways. Firstly we introduced rule-based re-ordering constraints on the decoding. This consisted of a set of rules that were used to segment the input utterances into segments that could be decoded almost independently. This idea here being that constraining the decoding process in this manner would greatly reduce the search space of the decoder, and cut out many possibilities for error while at the same time allowing for a correct output to be generated. The rules we used exploit punctuation and spacing in the input utterances, and we use these positions to delimit our segments. Not all punctuation/spacing positions were used as segment boundaries, and the set of used positions were determined by a set of linguistically-based heuristics. Secondly we used two heterogeneous methods to build the translation model, and lexical reordering model for our systems. The first method employed the popular method of using GIZA++ for alignment in combination with phraseextraction heuristics. The second method used a recentlydeveloped Bayesian alignment technique that is able to perform both phrase-to-phrase alignment and phrase pair extraction within a single unsupervised process. The models produced by this type of alignment technique are typically very compact whilst at the same time maintaining a high level of translation quality. We evaluated both of these methods of translation model construction in isolation, and our results show their performance is comparable. We also integrated both models by linear interpolation to obtain a model that outperforms either component. Finally, we added an indicator feature into the log-linear model to indicate those phrases that were in the intersection of the two translation models. The addition of this feature was also able to provide a small improvement in performance.

pdf bib
The MSR system for IWSLT 2011 evaluation
Xiaodong He | Amittai Axelrod | Li Deng | Alex Acero | Mei-Yuh Hwang | Alisa Nguyen | Andrew Wang | Xiahui Huang

This paper describes the Microsoft Research (MSR) system for the evaluation campaign of the 2011 international workshop on spoken language translation. The evaluation task is to translate TED talks ( This task presents two unique challenges: First, the underlying topic switches sharply from talk to talk. Therefore, the translation system needs to adapt to the current topic quickly and dynamically. Second, only a very small amount of relevant parallel data (transcripts of TED talks) is available. Therefore, it is necessary to perform accurate translation model estimation with limited data. In the preparation for the evaluation, we developed two new methods to attack these problems. Specifically, we developed an unsupervised topic modeling based adaption method for machine translation models. We also developed a discriminative training method to estimate parameters in the generative components of the translation models with limited data. Experimental results show that both methods improve the translation quality. Among all the submissions, ours achieves the best BLEU score in the machine translation Chinese-to-English track (MT_CE) of the IWSLT 2011 evaluation that we participated.

pdf bib
LIMSI’s experiments in domain adaptation for IWSLT11
Thomas Lavergne | Alexandre Allauzen | Hai-Son Le | François Yvon

LIMSI took part in the IWSLT 2011 TED task in the MT track for English to French using the in-house n-code system, which implements the n-gram based approach to Machine Translation. This framework not only allows to achieve state-of-the-art results for this language pair, but is also appealing due to its conceptual simplicity and its use of well understood statistical language models. Using this approach, we compare several ways to adapt our existing systems and resources to the TED task with mixture of language models and try to provide an analysis of the modest gains obtained by training a log linear combination of inand out-of-domain models.

pdf bib
LIG English-French spoken language translation system for IWSLT 2011
Benjamin Lecouteux | Laurent Besacier | Hervé Blanchon

This paper describes the system developed by the LIG laboratory for the 2011 IWSLT evaluation. We participated to the English-French MT and SLT tasks. The development of a reference translation system (MT task), as well as an ASR output translation system (SLT task) are presented. We focus this year on the SLT task and on the use of multiple 1-best ASR outputs to improve overall translation quality. The main experiment presented here compares the performance of a SLT system where multiple ASR 1-best are combined before translation (source combination), with a SLT system where multiple ASR 1-best are translated, the system combination being conducted afterwards on the target side (target combination). The experimental results show that the second approach (target combination) overpasses the first one, when the performance is measured with BLEU.

pdf bib
The KIT English-French translation systems for IWSLT 2011
Mohammed Mediani | Eunach Cho | Jan Niehues | Teresa Herrmann | Alex Waibel

This paper presents the KIT system participating in the English→French TALK Translation tasks in the framework of the IWSLT 2011 machine translation evaluation. Our system is a phrase-based translation system using POS-based reordering extended with many additional features. First of all, a special preprocessing is devoted to the Giga corpus in order to minimize the effect of the great amount of noise it contains. In addition, the system gives more importance to the in-domain data by adapting the translation and the language models as well as by using a wordcluster language model. Furthermore, the system is extended by a bilingual language model and a discriminative word lexicon. The automatic speech transcription input usually has no or wrong punctuation marks, therefore these marks were especially removed from the source training data for the SLT system training.

pdf bib
LIUM’s systems for the IWSLT 2011 speech translation tasks
Anthony Rousseau | Fethi Bougares | Paul Deléglise | Holger Schwenk | Yannick Estève

This paper describes the three systems developed by the LIUM for the IWSLT 2011 evaluation campaign. We participated in three of the proposed tasks, namely the Automatic Speech Recognition task (ASR), the ASR system combination task (ASR_SC) and the Spoken Language Translation task (SLT), since these tasks are all related to speech translation. We present the approaches and specificities we developed on each task.

pdf bib
N. Ruiz | A. Bisazza | F. Brugnara | D. Falavigna | D. Giuliani | S. Jaber | R. Gretter | M. Federico

This paper reports on the participation of FBK at the IWSLT 2011 Evaluation: namely in the English ASR track, the Arabic-English MT track and the English-French MT and SLT tracks. Our ASR system features acoustic models trained on a portion of the TED talk recordings that was automatically selected according to the fidelity of the provided transcriptions. Three decoding steps are performed interleaved by acoustic feature normalization and acoustic model adaptation. Concerning the MT and SLT systems, besides language specific pre-processing and the automatic introduction of punctuation in the ASR output, two major improvements are reported over our last year baselines. First, we applied a fill-up method for phrase-table adaptation; second, we explored the use of hybrid class-based language models to better capture the language style of public speeches.

pdf bib
The 2011 KIT English ASR system for the IWSLT evaluation
Sebastian Stüker | Kevin Kilgour | Christian Saam | Alex Waibel

This paper describes our English Speech-to-Text (STT) system for the 2011 IWSLT ASR track. The system consists of 2 subsystems with different front-ends—one MVDR based, one MFCC based—which are combined using confusion network combination to provide a base for a second pass speaker adapted MVDR system. We demonstrate that this set-up produces competitive results on the IWSLT 2010 dev and test sets.

pdf bib
DFKI’s SC and MT submissions to IWSLT 2011
David Vilar | Eleftherios Avramidis | Maja Popović | Sabine Hunsicker

We describe DFKI’s submission to the System Combination and Machine Translation tracks of the 2011 IWSLT Evaluation Campaign. We focus on a sentence selection mechanism which chooses the (hopefully) best sentence among a set of candidates. The rationale behind it is to take advantage of the strengths of each system, especially given an heterogeneous dataset like the one in this evaluation campaign, composed of TED Talks of very different topics. We focus on using features that correlate well with human judgement and, while our primary system still focus on optimizing the BLEU score on the development set, our goal is to move towards optimizing directly the correlation with human judgement. This kind of system is still under development and was used as a secondary submission.

pdf bib
The RWTH Aachen machine translation system for IWSLT 2011
Joern Wuebker | Matthias Huck | Saab Mansour | Markus Freitag | Minwei Feng | Stephan Peitz | Christoph Schmidt | Hermann Ney

In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2011 is presented. We participated in the MT (English-French, Arabic-English, ChineseEnglish) and SLT (English-French) tracks. Both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated, including domain adaptation via monolingual and bilingual data selection, phrase training, different lexical smoothing methods, additional reordering models for the hierarchical system, various Arabic and Chinese segmentation methods, punctuation prediction for speech recognition output, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib
Advances on spoken language translation in the Quaero program
Karim Boudahmane | Bianka Buschbeck | Eunah Cho | Josep Maria Crego | Markus Freitag | Thomas Lavergne | Hermann Ney | Jan Niehues | Stephan Peitz | Jean Senellart | Artem Sokolov | Alex Waibel | Tonio Wandmacher | Joern Wuebker | François Yvon

The Quaero program is an international project promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within the program framework, research organizations and industrial partners collaborate to develop prototypes of innovating applications and services for access and usage of multimedia data. One of the topics addressed is the translation of spoken language. Each year, a project-internal evaluation is conducted by DGA to monitor the technological advances. This work describes the design and results of the 2011 evaluation campaign. The participating partners were RWTH, KIT, LIMSI and SYSTRAN. Their approaches are compared on both ASR output and reference transcripts of speech data for the translation between French and German. The results show that the developed techniques further the state of the art and improve translation quality.

pdf bib
Speech recognition for machine translation in Quaero
Lori Lamel | Sandrine Courcinous | Julien Despres | Jean-Luc Gauvain | Yvan Josse | Kevin Kilgour | Florian Kraft | Viet-Bac Le | Hermann Ney | Markus Nußbaum-Thom | Ilya Oparin | Tim Schlippe | Ralf Schlüter | Tanja Schultz | Thiago Fraga da Silva | Sebastian Stüker | Martin Sundermeyer | Bianca Vieru | Ngoc Thang Vu | Alexander Waibel | Cécile Woehrling

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero ( is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.

pdf bib
Protocol and lessons learnt from the production of parallel corpora for the evaluation of speech translation systems
Victoria Arranz | Olivier Hamon | Karim Boudahmane | Martine Garnier-Rizet

Machine translation evaluation campaigns require the production of reference corpora to automatically measure system output. This paper describes recent efforts to create such data with the objective of measuring the quality of the systems participating in the Quaero evaluations. In particular, we focus on the protocols behind such production as well as all the issues raised by the complexity of the transcription data handled.

pdf bib
Fill-up versus interpolation methods for phrase-based SMT adaptation
Arianna Bisazza | Nick Ruiz | Marcello Federico

This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task – the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.

pdf bib
Semantic smoothing and fabrication of phrase pairs for SMT
Boxing Chen | Roland Kuhn | George Foster

In statistical machine translation systems, phrases with similar meanings often have similar but not identical distributions of translations. This paper proposes a new soft clustering method to smooth the conditional translation probabilities for a given phrase with those of semantically similar phrases. We call this semantic smoothing (SS). Moreover, we fabricate new phrase pairs that were not observed in training data, but which may be used for decoding. In learning curve experiments against a strong baseline, we obtain a consistent pattern of modest improvement from semantic smoothing, and further modest improvement from phrase pair fabrication.

pdf bib
SCFG latent annotation for machine translation
Tagyoung Chung | Licheng Fang | Daniel Gildea

We discuss learning latent annotations for synchronous context-free grammars (SCFG) for the purpose of improving machine translation. We show that learning annotations for nonterminals results in not only more accurate translation, but also faster SCFG decoding.

pdf bib
Long-distance hierarchical structure transformation rules utilizing function words
Chenchen Ding | Takashi Inui | Mikio Yamamoto

In this paper, we propose structure transformation rules for statistical machine translation which are lexicalized by only function words. Although such rules can be extracted from an aligned parallel corpus simply as original phrase pairs, their structure is hierarchical and thus can be used in a hierarchical translation system. In addition, structure transformation rules can take into account long-distance reordering, allowing for more than two phrases to be moved simultaneously. The rule set is used as a core module in our hierarchical model together with two other modules, namely, a basic reordering module and an optional gap phrase module. Our model is considerably more compact and produces slightly higher BLEU scores than the original hierarchical phrase-based model in Japanese-English translation on the parallel corpus of the NTCIR-7 patent translation task.

pdf bib
Investigation of the effects of ASR tuning on speech translation performance
Paul R. Dixon | Andrew Finch | Chiori Hori | Hideki Kashioka

In this paper we describe some of our recent investigations into ASR and SMT coupling issues from an ASR perspective. Our study was motivated by several areas: Firstly, to understand how standard ASR tuning procedures effect the SMT performance and whether it is safe to perform this tuning in isolation. Secondly, to investigate how vocabulary and segmentation mismatches between the ASR and SMT system effect the performance. Thirdly, to uncover any practical issues that arise when using a WFST based speech decoder for tight coupling as opposed to a more traditional tree-search decoding architecture. On the IWSLT07 Japanese-English task we found that larger language model weights only helped the SMT performance when the ASR decoder was tuned in a sub-optimal manner. When we considered the performance with suitable wide beams that ensured the ASR accuracy had converged we observed the language model weight had little influence on the SMT BLEU scores. After the construction of the phrase table the actual SMT vocabulary can be less than the training data vocabulary. By reducing the ASR lexicon to only cover the words the SMT system could accept, we found this lead to an increase in the ASR error rates, however the SMT BLEU scores were nearly unchanged. From a practical point of view this is a useful result as it means we can significantly reduce the memory footprint of the ASR system. We also investigated coupling WFST based ASR to a simple WFST based translation decoder and found it was crucial to perform phrase table expansion to avoid OOV problems. For the WFST translation decoder we describe a semiring based approach for optimizing the log-linear weights.

pdf bib
Extending a probabilistic phrase alignment approach for SMT
Mridul Gupta | Sanjika Hewavitharana | Stephan Vogel

Phrase alignment is a crucial step in phrase-based statistical machine translation. We explore a way of improving phrase alignment by adding syntactic information in the form of chunks as soft constraints guided by an in-depth and detailed analysis on a hand-aligned data set. We extend a probabilistic phrase alignment model that extracts phrase pairs by optimizing phrase pair boundaries over the sentence pair [1]. The boundaries of the target phrase are chosen such that the overall sentence alignment probability is optimal. Viterbi alignment information is also added in the extended model with a view of improving phrase alignment. We extract phrase pairs using a relatively larger number of features which are discriminatively trained using a large-margin online learning algorithm, i.e., Margin Infused Relaxed Algorithm (MIRA) and integrate it in our approach. Initial experiments show improvements in both phrase alignment and translation quality for Arabic-English on a moderate-size translation task.

pdf bib
Left language model state for syntactic machine translation
Kenneth Heafield | Hieu Hoang | Philipp Koehn | Tetsuo Kiso | Marcello Federico

Many syntactic machine translation decoders, including Moses, cdec, and Joshua, implement bottom-up dynamic programming to integrate N-gram language model probabilities into hypothesis scoring. These decoders concatenate hypotheses according to grammar rules, yielding larger hypotheses and eventually complete translations. When hypotheses are concatenated, the language model score is adjusted to account for boundary-crossing n-grams. Words on the boundary of each hypothesis are encoded in state, consisting of left state (the first few words) and right state (the last few words). We speed concatenation by encoding left state using data structure pointers in lieu of vocabulary indices and by avoiding unnecessary queries. To increase the decoder’s opportunities to recombine hypothesis, we minimize the number of words encoded by left state. This has the effect of reducing search errors made by the decoder. The resulting gain in model score is smaller than for right state minimization, which we explain by observing a relationship between state minimization and language model probability. With a fixed cube pruning pop limit, we show a 3-6% reduction in CPU time and improved model scores. Reducing the pop limit to the point where model scores tie the baseline yields a net 11% reduction in CPU time.


bib (full) Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

pdf bib
Lexicon models for hierarchical phrase-based machine translation
Matthias Huck | Saab Mansour | Simon Wiesler | Hermann Ney

In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a discriminatively trained word lexicon model [4]. We explore sourceto-target models with phrase-level as well as sentence-level scoring and target-to-source models with scoring on phrase level only. For the first two types of lexicon models, we compare several scoring variants. All models are used during search, i.e. they are incorporated directly into the log-linear model combination of the decoder. Phrase table smoothing with triplet lexicon models and with discriminative word lexicons are novel contributions. We also propose a new regularization technique for IBM model 1 by means of the Kullback-Leibler divergence with the empirical unigram distribution as regularization term. Experiments are carried out on the large-scale NIST Chinese→English translation task and on the English→French and Arabic→English IWSLT TED tasks. For Chinese→English and English→French, we obtain the best results by using the discriminative word lexicon to smooth our phrase tables.

pdf bib
The 2011 KIT QUAERO speech-to-text system for Spanish
Kevin Kilgour | Christian Saam | Christian Mohr | Sebastian Stüker | Alex Waibel

This paper describes our current Spanish speech-to-text (STT) system with which we participated in the 2011 Quaero STT evaluation that is being developed within the Quaero program. The system consists of 4 separate subsystems, as well as the standard MFCC and MVDR phoneme based subsystems we included a both a phoneme and grapheme based bottleneck subsystem. We carefully evaluate the performance of each subsystem. After including several new techniques we were able to reduce the WER by over 30% from 20.79% to 14.53%.

pdf bib
Named entity translation using anchor texts
Wang Ling | Pável Calado | Bruno Martins | Isabel Trancoso | Alan Black | Luísa Coheur

This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.

pdf bib
Unsupervised vocabulary selection for simultaneous lecture translation
Paul Maergner | Kevin Kilgour | Ian Lane | Alex Waibel

In this work, we propose a novel method for vocabulary selection which enables simultaneous speech recognition systems for lectures to automatically adapt to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World Wide Web and generates a lecture-specific vocabulary and language model based on the resulting documents. In this paper, we introduce a novel method for vocabulary selection where we rank vocabulary that occurs in the collected documents based on a relevance score which is calculated using a combination of word features. Vocabulary selection is a critical component for topic adaptation that has typically been overlooked in prior works. On the interACT German-English simultaneous lecture translation system our proposed approach significantly improved vocabulary coverage, reducing the out-of-vocabulary rate on average by 57.0% and up to 84.9%, compared to a lecture-independent baseline. Furthermore, our approach reduced the word error rate by up to 25.3% (on average 13.2% across all lectures), compared to a lectureindependent baseline.

pdf bib
Combining translation and language model scoring for domain-specific data filtering
Saab Mansour | Joern Wuebker | Hermann Ney

The increasing popularity of statistical machine translation (SMT) systems is introducing new domains of translation that need to be tackled. As many resources are already available, domain adaptation methods can be applied to utilize these recourses in the most beneficial way for the new domain. We explore adaptation via filtering, using the crossentropy scores to discard irrelevant sentences. We focus on filtering for two important components of an SMT system, namely the language model (LM) and the translation model (TM). Previous work has already applied LM cross-entropy based scoring for filtering. We argue that LM cross-entropy might be appropriate for LM filtering, but not as much for TM filtering. We develop a novel filtering approach based on a combined TM and LM cross-entropy scores. We experiment with two large-scale translation tasks, the Arabic-to-English and English-to-French IWSLT 2011 TED Talks MT tasks. For LM filtering, we achieve strong perplexity improvements which carry over to the translation quality with improvements up to +0.4% BLEU. For TM filtering, the combined method achieves small but consistent improvements over the standalone methods. As a side effect of adaptation via filtering, the fully fledged SMT system vocabulary size and phrase table size are reduced by a factor of at least 2 while up to +0.6% BLEU improvement is observed.

pdf bib
Using Wikipedia to translate domain-specific terms in SMT
Jan Niehues | Alex Waibel

When building a university lecture translation system, one important step is to adapt it to the target domain. One problem in this adaptation task is to acquire translations for domain specific terms. In this approach we tried to get these translations from Wikipedia, which provides articles on very specific topics in many different languages. To extract translations for the domain specific terms, we used the interlanguage links of Wikipedia . We analyzed different methods to integrate this corpus into our system and explored methods to disambiguate between different translations by using the text of the articles. In addition, we developed methods to handle different morphological forms of the specific terms in morphologically rich input languages like German. The results show that the number of out-of-vocabulary (OOV) words could be reduced by 50% on computer science lectures and the translation quality could be improved by more than 1 BLEU point.

pdf bib
Modeling punctuation prediction as machine translation
Stephan Peitz | Markus Freitag | Arne Mauser | Hermann Ney

Punctuation prediction is an important task in Spoken Language Translation. The output of speech recognition systems does not typically contain punctuation marks. In this paper we analyze different methods for punctuation prediction and show improvements in the quality of the final translation output. In our experiments we compare the different approaches and show improvements of up to 0.8 BLEU points on the IWSLT 2011 English French Speech Translation of Talks task using a translation system to translate from unpunctuated to punctuated text instead of a language model based punctuation prediction method. Furthermore, we do a system combination of the hypotheses of all our different approaches and get an additional improvement of 0.4 points in BLEU.

pdf bib
Soft string-to-dependency hierarchical machine translation
Jan-Thorsten Peter | Matthias Huck | Hermann Ney | Daniel Stein

In this paper, we dissect the influence of several target-side dependency-based extensions to hierarchical machine translation, including a dependency language model (LM). We pursue a non-restrictive approach that does not prohibit the production of hypotheses with malformed dependency structures. Since many questions remained open from previous and related work, we offer in-depth analysis of the influence of the language model order, the impact of dependency-based restrictions on the search space, and the information to be gained from dependency tree building during decoding. The application of a non-restrictive approach together with an integrated dependency LM scoring is a novel contribution which yields significant improvements for two large-scale translation tasks for the language pairs Chinese–English and German–French.

pdf bib
Speaker alignment in synthesised, machine translation communication
Anne H. Schneider | Saturnino Luz

The effect of mistranslations on the verbal behaviour of users of speech-to-speech translation is investigated through a question answering experiment in which users were presented with machine translated questions through synthesized speech. Results show that people are likely to align their verbal behaviour to the output of a system that combines machine translation, speech recognition and speech synthesis in an interactive dialogue context, even when the system produces erroneous output. The alignment phenomenon has been previously considered by dialogue system designers from the perspective of the benefits it might bring to the interaction (e.g. by making the user more likely to employ terms contained in the system’s vocabulary). In contrast, our results reveal that in speech-to-speech translation systems alignment can in fact be detrimental to the interaction (e.g. by priming the user to align with non-existing lexical items produced by mistranslation). The implications of these findings are discussed with respect to the design of such systems.

pdf bib
How good are your phrases? Assessing phrase quality with single class classification
Nadi Tomeh | Marco Turchi | Guillaume Wisinewski | Alexandre Allauzen | François Yvon

We present a novel translation quality informed procedure for both extraction and scoring of phrase pairs in PBSMT systems. We reformulate the extraction problem in the supervised learning framework. Our goal is twofold. First, We attempt to take the translation quality into account; and second we incorporating arbitrary features in order to circumvent alignment errors. One-Class SVMs and the Mapping Convergence algorithm permit training a single-class classifier to discriminate between useful and useless phrase pairs. Such classifier can be learned from a training corpus that comprises only useful instances. The confidence score, produced by the classifier for each phrase pairs, is employed as a selection criteria. The smoothness of these scores allow a fine control over the size of the resulting translation model. Finally, confidence scores provide a new accuracy-based feature to score phrase pairs. Experimental evaluation of the method shows accurate assessments of phrase pairs quality even for regions in the space of possible phrase pairs that are ignored by other approaches. This enhanced evaluation of phrase pairs leads to improvements in the translation performance as measured by BLEU.

pdf bib
Annotating data selection for improving machine translation
Keiji Yasuda | Hideo Okuma | Masao Utiyama | Eiichiro Sumita

In order to efficiently improve machine translation systems, we propose a method which selects data to be annotated (manually translated) from speech-to-speech translation field data. For the selection experiments, we used data from field experiments conducted during the 2009 fiscal year in five areas of Japan. For the selection experiments, we used data sets from two areas: one data set giving the lowest baseline speech translation performance for its test set, and another data set giving the highest. In the experiments, we compare two methods for selecting data to be manually translated from the field data. Both of them use source side language models for data selection, but in different manners. According to the experimental results, either or both of the methods show larger improvements compared to a random data selection.