Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

Anthology ID:: 2010.iwslt-papers
Month:: December 2-3
Year:: 2010
Address:: Paris, France
Venue:: IWSLT
Event:: International Conference on Spoken Language Translation (2010)
SIG:: SIGSLT
Publisher:
URL:: https://aclanthology.org/2010.iwslt-papers/
DOI:
Bib Export formats:: BibTeX MODS XML EndNote

BibTeX Search

pdf bib

CCG augmented hierarchical phrase-based machine translation
Hala Almaghout | Jie Jiang | Andy Way

pdf bib

An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Marianna Apidianaki | Yifan He

pdf bib abs

Mining parallel fragments from comparable texts
Mauro Cettolo | Marcello Federico | Nicola Bertoldi

This paper proposes a novel method for exploiting comparable documents to generate parallel data for machine translation. First, each source document is paired to each sentence of the corresponding target document; second, partial phrase alignments are computed within the paired texts; finally, fragment pairs across linked phrase-pairs are extracted. The algorithm has been tested on two recent challenging news translation tasks. Results show that mining for parallel fragments is more effective than mining for parallel sentences, and that comparable in-domain texts can be more valuable than parallel out-of-domain texts.

pdf bib

Improved Vietnamese-French parallel corpus mining using English language
Thi Ngoc Diep Do | Laurent Besacier | Eric Castelli

pdf bib

Analysis of translation model adaptation in statistical machine translation
Kevin Duh | Katsuhito Sudoh | Hajime Tsukada

pdf bib abs

The pay-offs of preprocessing for German-English statistical machine translation
Ilknur Durgar El-Kahlout | Francois Yvon

In this paper, we present the result of our work on improving the preprocessing for German-English statistical machine translation. We implemented and tested various improvements aimed at i) converting German texts to the new orthographic conventions; ii) performing a new tokenization for German; iii) normalizing lexical redundancy with the help of POS tagging and morphological analysis; iv) splitting German compound words with frequency based algorithm and; v) reducing singletons and out-of-vocabulary words. All these steps are performed during preprocessing on the German side. Combining all these processes, we reduced 10% of the singletons, 2% OOV words, and obtained 1.5 absolute (7% relative) BLEU improvement on the WMT 2010 German to English News translation task.

pdf bib

A Bayesian model of bilingual segmentation for transliteration
Andrew Finch | Eiichiro Sumita

pdf bib

Faster cube pruning
Andrea Gesmundo | James Henderson

pdf bib

Factor templates for factored machine translation models
Yvette Graham | Josef van Genabith

pdf bib abs

Modelling pronominal anaphora in statistical machine translation
Christian Hardmeier | Marcello Federico

Current Statistical Machine Translation (SMT) systems translate texts sentence by sentence without considering any cross-sentential context. Assuming independence between sentences makes it difficult to take certain translation decisions when the necessary information cannot be determined locally. We argue for the necessity to include crosssentence dependencies in SMT. As a case in point, we study the problem of pronominal anaphora translation by manually evaluating German-English SMT output. We then present a word dependency model for SMT, which can represent links between word pairs in the same or in different sentences. We use this model to integrate the output of a coreference resolution system into English-German SMT with a view to improving the translation of anaphoric pronouns.

pdf bib abs

A combination of hierarchical systems with forced alignments from phrase-based systems
Carmen Heger | Joern Wuebker | David Vilar | Hermann Ney

Currently most state-of-the-art statistical machine translation systems present a mismatch between training and generation conditions. Word alignments are computed using the well known IBM models for single-word based translation. Afterwards phrases are extracted using extraction heuristics, unrelated to the stochastic models applied for finding the word alignment. In the last years, several research groups have tried to overcome this mismatch, but only with limited success. Recently, the technique of forced alignments has shown to improve translation quality for a phrase-based system, applying a more statistically sound approach to phrase extraction. In this work we investigate the first steps to combine forced alignment with a hierarchical model. Experimental results on IWSLT and WMT data show improvements in translation quality of up to 0.7% BLEU and 1.0% TER.

pdf bib abs

Multi-pivot translation by system combination
Gregor Leusch | Aurélien Max | Josep Maria Crego | Hermann Ney

This paper describes a technique to exploit multiple pivot languages when using machine translation (MT) on language pairs with scarce bilingual resources, or where no translation system for a language pair is available. The principal idea is to generate intermediate translations in several pivot languages, translate them separately into the target language, and generate a consensus translation out of these using MT system combination techniques. Our technique can also be applied when a translation system for a language pair is available, but is limited in its translation accuracy because of scarce resources. Using statistical MT systems for the 11 different languages of Europarl, we show experimentally that a direct translation system can be replaced by this pivot approach without a loss in translation quality if about six pivot languages are available. Furthermore, we can already improve an existing MT system by adding two pivot systems to it. The maximum improvement was found to be 1.4% abs. in BLEU in our experiments for 8 or more pivot languages.

pdf bib

Real-time spoken language identification and recognition for speech-to-speech translation
Daniel Chung Yong Lim | Ian Lane | Alex Waibel

pdf bib abs

Towards a general and extensible phrase-extraction algorithm
Wang Ling | Tiago Luís | João Graça | Luísa Coheur | Isabel Trancoso

Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.

pdf bib abs

MorphTagger: HMM-based Arabic segmentation for statistical machine translation
Saab Mansour

In this paper, we investigate different methodologies of Arabic segmentation for statistical machine translation by comparing a rule-based segmenter to different statistically-based segmenters. We also present a new method for segmentation that serves the need for a real-time translation system without impairing the translation accuracy.

pdf bib

Comparing intrinsic and extrinsic evaluation of MT output in a dialogue system
Anne H. Schneider | Ielka van der Sluis | Saturnino Luz

pdf bib abs

Sign language machine translation overkill
Daniel Stein | Christoph Schmidt | Hermann Ney

Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques and do not adapt them to small-sized corpora. In this paper, we will propose new methods for common approaches like scaling factor optimization and alignment merging strategies which helped improve our baseline. We also conduct experiments with different decoders and employ state-of-the-art techniques like soft syntactic labels as well as trigger-based and discriminative word lexica and system combination. All methods are evaluated on one of the largest sign language corpora available.

pdf bib

If I only had a parser: poor man’s syntax for hierarchical machine translation
David Vilar | Daniel Stein | Stephan Peitz | Hermann Ney

pdf bib

Dynamic distortion in a discriminative reordering model for statistical machine translation
Sirvan Yahyaei | Christoph Monz