MT Summit XI, September 2007: table of contents

Machine Translation Summit XI

10-14 September 2007, Copenhagen, Denmark

Proceedings. Editor: Bente Maegaard

[ISBN: 978-87-90708-16-0]

Cover

Bente Maegaard: Preface

Invited papers

Doris Marty-Albisser: Client centric multilingual information leveraging – scale the skills

Stephen Richardson: Microsoft machine translation: from research to real user

Philipp Koehn: EuroMatrix – machine translation for all European languages

Akitoshi Okumura: Human communication technology – development of speech translation for hand-held devices

Main conference papers

Takako Aikawa, Lee Schwartz, Ronit King, Mo Corston-Oliver, and Carmen Lozano: Impact of controlled language on translation quality and post-editing in a statistical machine translation environment. pp.1-7 [PDF, 314KB]

Vicente Alabau, Alberto Sanchis, and Francisco Casacuberta: Improving speech-to-speech translation using word posterior probabilities. pp.9-14 [PDF, 87KB]

Abhishek Arun and Philipp Koehn: Online learning methods for discriminative training of phrase based statistical machine translation.pp.15-20 [PDF, 133KB]

Julia Aymerich and Hermes Camelo: Automatic extraction of entries for a machine translation dictionary using bitexts. pp.21-27 [PDF, 88KB]

Bogdan Babych, Anthony Hartley, and Serge Sharoff: Translating from under-resourced languages: comparing direct transfer against pivot translation. pp.29-35 [PDF, 197KB]

Eckard Bick: Dan2eng: wide-coverage Danish-English machine translation. pp.37-43 [PDF, 621KB]

Michael Blench: Global Public Health Intelligence Network (GPHIN). pp.45-49 [PDF, 331KB]

Rens Bod: Unsupervised syntax-based machine translation: the contribution of discontiguous phrases. pp.51-56 [PDF, 87KB]

Guihong Cao, Jianfeng Gao, and Jian-Yun Nie: A system to mine large-scale bilingual dictionaries from monolingual web pages. pp.57-64 [PDF, 539KB]

Michael Carl: METIS-II: the German to English MT system. pp.65-72 [PDF, 166KB]

Marine Carpuat and Dekai Wu: Context-dependent phrasal translation lexicons for statistical machine translation. pp.73-80 [PDF, 425KB]

Yi Chang, Ying Zhang, Stephan Vogel, and Jie Yang: Enhancing image-based Arabic document translation using noisy channel correction model. pp.89-95 [PDF, 157KB]

Jing-Shin Chang and Chun-Kai Kung: A Chinese-to-Chinese statistical machine translation model for mining synonymous simplified-traditional Chinese terms. pp.81-87 [PDF, 314KB]

Wen-Han Chao and Zhou-Jun Li: Incorporating constituent structure constraint into discriminative word alignment. pp.97-103 [PDF, 112KB]

Boxing Chen, Marcello Federico, and Mauro Cettolo: Better n-best translations through generative n-gram language models.pp.105-110 [PDF, 77KB]

Josep M. Crego and José B. Mariño: Syntax-enhanced n-gram-based SMT. pp.111-118 [PDF, 188KB]

Paul C. Davis, Zhuli Xie, and Kevin Small: All links are not the same: evaluating word alignments for statistical machine translation. pp.119-126 [PDF, 122KB]

Daniel Déchelotte, Holger Schwenk, Hélène Bonneau-Maynard, Alexandre Allauzen, and Gilles Adda: A state-of-the-art statistical machine translation system based on Moses.pp.127-133 [PDF, 142KB]

Etienne Denoual: Analogical translation of unknown words in a statistical machine translation framework. pp.135-141 [PDF, 94KB]

Mona Diab, Mahmoud Ghoneim, Nizar Habash: Arabic diacritization in the context of statistical machine translation. pp.143-149 [PDF, 105KB]

Hiroshi Echizen-ya and Kenji Araki: Automatic evaluation of machine translation based on recursive acquisition of an intuitive common parts continuum. pp.151-158 [PDF, 589KB]

Matthias Eck, Stephan Vogel, and Alex Waibel: Estimating phrase pair relevance for translation model pruning. pp.159-165 [PDF, 157KB]

Paula Estrella, Olivier Hamon, and Andrei Popescu-Belis: How much data is needed for reliable MT evaluation? Using bootstrapping to study human and automatic metrics. pp.167-174 [PDF, 143KB]

Oren Etzioni, Kobi Reiter, Stephen Soderland, and Marcus Sammer: Lexical translation with application to image searching on the web. pp.175-182 [PDF, 603KB]

Ariadna Font Llitjós, Jaime Carbonell, and Alon Lavie: Improving transfer-based MT systems with automatic refinements. pp.183-190 [PDF, 339KB]

Pablo Gamallo Otero: Learning bilingual lexicons from comparable English and Spanish corpora. pp.191-197 [PDF, 509KB]

Federico Gaspari & John Hutchins: Online and free! Ten years of online machine translation: origins, developments, current use and future prospects. pp.199-206 [PDF, 128KB]

Deepa Gupta, Mauro Cettolo, and Marcello Federico: POS-based reordering models for statistical machine translation. pp. 207-213 [PDF, 152KB]

Nizar Habash: Syntactic preprocessing for statistical machine translation. pp.215-222 [PDF, 118KB]

Olivier Hamon, Djamel Mostefa, and Khalid Choukri: End-to-end evaluation of a speech-to-speech translation system in TC-STAR. pp.223-230 [PDF, 116KB]

Olivier Hamon, Anthony Hartley, Andrei Popescu-Belis, and Khalid Choukri: Assessing human and automated quality judgments in the French MT evaluation campaign CESTA. pp. 231-238 [PDF, 93KB]

Mary Harper, Alex Acero, Srinivas Bangalore, Jaime Carbonell, Jordan Cohen, Barbara Cuthill, Carol Espy-Wilson, Christiane Fellbaum, John Garofolo, Chin-Hui Lee, Jim Lester, Andrew McCallum, Nelson Morgan, Michael Picheney, Joe Picone, Lance Ramshaw, Jeff Reynar, Hadar Shemtov, and Clare Voss: Report on the NSF-sponsored Human Language Technology Workshop on Industrial Centers. pp.239-246 [PDF, 94KB]

Sanjika Hewavitharana, Alon Lavie, and Stephan Vogel: Experiments with a noun-phrase driven statistical machine translation system. pp.247-253 [PDF, 241KB]

Pierre Isabelle, Cyril Goutte, and Michel Simard: Domain adaptation of MT systems through automatic post-editing. pp.255-261 [PDF, 151KB]

Hitoshi Isahara, Sadao Kurohashi, Jun’ichi Tsujii, Kiyotaka Uchimoto, Hiroshi Nakagawa, Hiroyuki Kaji, and Shun’ichi Kikuchi: Development of a Japanese-Chinese machine translation system. pp.263-267 [PDF, 161KB]

Masaki Itagaki, Takako Aikawa, and Xiaodong He: Automatic validation of terminology translation consistenscy with statistical method. pp.269-274 [PDF, 416KB]

Heiki-Jaan Kaalep and Kaarel Veskis: Comparing parallel corpora and evaluating their quality. pp.275-279 [PDF, 164KB]

Jae Dong Kim and Stephan Vogel: Iterative refinement of lexicon and phrasal alignment. pp.281-288 [PDF, 121KB]

Katrin Kirchhoff, Owen Rambow, Nizar Habash, and Mona Diab: Semi-automatic error analysis for large-scale statistical machine translation. pp.289-296 [PDF, 192KB]

Gorka Labaka, Nicolas Stroppa, Andy Way, and Kepa Sarasola: Comparing rule-based and data-driven approaches to Spanish-to-Basque machine translation. pp.297-304 [PDF, 558KB]

Zhanyi Liu, Hifeng Wang, and Hua Wu: Log-linear generation models for example-based machine translation. pp.305-312 [PDF, 178KB]

Lieve Macken, Julia Trushkina, and Lidia Rura: Dutch parallel corpus: MT corpus and translator’s aid. pp.313-320 [PDF, 132KB]

Robert C. Moore and Chris Quirk: Faster beam-search decoding for phrasal statistical machine translation. pp.321-327 [PDF, 241KB]

Sara Morrissey, Andy Way, Daniel Stein, Jan Bungeroth, and Hermann Ney: Combining data-driven MT systems for improved sign language translation. pp.329-336 [PDF, 189KB]

Toshiaki Nakazawa, Yu Kun, and Sadao Kurohashi: Structural phrase alignment based on consistency criteria. pp.337-344 [PDF, 202KB]

Sharon O’Brien and Johann Roturier: How portable are controlled language rules? A comparison of two empirical MT studies. pp.345-352 [PDF, 77KB]

Jong-Hoon Oh and Hitoshi Isahara: Machine transliteration using multiple transliteration engines and hypothesis re-ranking. pp.353-360 [PDF, 253KB]

Hideo Okuma, Hirofumi Yamamoto, and Eiichiro Sumita: Introducing translation dictionary into phrase-based SMT. pp.361-368 [PDF, 446KB]

Aaron B. Phillips, Violetta Cavalli-Sforza, and Ralf D. Brown: Improving example-based machine translation through morphological generalization and adaptation. pp.369-375 [PDF, 115KB]

Chris Quirk, Raghavendra Udupa U., and Arul Menezes: Generative models of noisy translations with applications to parallel fragment extraction. pp.377-384 [PDF, 249KB]

Sharath Rao, Ian Lane, and Tanja Schultz: Improving spoken language translation by automatic disfluency removal: evidence from conversational speech transcripts. pp.385-389 [PDF, 140KB]

Dengjun Ren, Hua Wu, and Haifeng Wang: Improving statistical word alignment with various clues. pp.391-397 [PDF, 167KB]

Marcus Sammer and Stephen Soderland: Building a sense-distinguished multilingual lexicon from monolingual corpora and bilingual lexicons. pp.399-406 [PDF, 263KB]

Alberto Sanchis, Alfons Juan, and Enrique Vidal: Estimation of confidence measures for machine translation. pp.407-412 [PDF, 129KB]

Young-Ae Seo, Chang-Hyun Kim, Seong-Il Yang, and Young-gil Kim: Getting professional translation through user interaction. pp.413-419 [PDF, 479KB]

Smriti Singh, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya, and Om P.Damani: Hindi generation from interlingua. pp.421-428 [PDF, 206KB]

R.Mahesh K. Sinha: Using rich morphology in resolving certain Hindi-English machine translation divergence. pp.429-433 [PDF, 48KB]

Hans-Udo Stadler and Ursula Peter-Spörndli: The quest for machine translation quality at CLS Communication. pp.435-442 [PDF, 115KB]

Sylvain Surcin, Elke Lange, and Jean Senellart: Rapid development of new language pairs at SYSTRAN. pp.443-449 [PDF, 114KB]

Koichi Takeuchi, Takashi Kanehila, Kazuki Hilao, Takeshi Abekawa, and Kyo Kageura: Flexible automatic look-up of English idiom entries in dictionaries. pp.451-458 [PDF, 489KB]

Ahmet Cüneyd Tantuğ, Eşref Adali, and Kemal Oflazer: A MT system from Turkmen to Turkish employing finite state and statistical methods. pp.459-465 [PDF, 399KB]

John Tinsley, Ventsislav Zhechev, Mary Hearne, and Andy Way: Robust language pair-independent sub-tree alignment. pp.467-474 [PDF, 1310KB]

Masao Utiyama and Hitoshi Isahara: A Japanese-English patent parallel corpus. pp.475-482 [PDF, 113KB]

István Varga and Shoichi Yokoyama: Japanese-Hungarian dictionary generation using ontology resources. pp.483-490 [PDF, 167KB]

Sami Virpioja, Jaako J.Väyrynen, Mathias Creutz, and Markus Sadeniemi: Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. pp.491-498 [PDF, 105KB]

Martin Volk and Søren Harder: Evaluating MT with translations or translators: what is the difference? pp.499-506 [PDF, 61KB]

Hua Wu and Haifeng Wang: Comparative study of word alignment heuristics and phrase-based SMT. pp.507-514 [PDF, 168KB]

Jia Xu, Yonggang Deng, Yuqing Gao, and Hermann Ney: Domain dependent statistical machine translation. pp.515-520 [PDF, 289KB]

Yang Ye, Karl-Michael Schneider, and Steven Abney: Aspect marker generation in English-to-Chinese machine translation. pp.521-527 [PDF, 129KB]

Jerneja Žganec Gros and Stanislav Gruden: English-Slovenian statistical machine translation: from a lower- to a highly-inflected language. pp.529-533 [PDF, 123KB]

Min Zhang, Hongfei Jiang, Ai Ti Aw, Jun Sun, Sheng Li, and Chew Lim Tan: A tree-to-tree alignment-based model for statistical machine translation. pp.535-542 [PDF, 243KB]

Ying Zhang and Stephan Vogel: PanDoRA: a large-scale two-way statistical machine translation system for hand-held devices. pp.543-550 [PDF, 472KB]

Yujie Zhang, Qing Ma, and Hitoshi Isahara: Building Japanese-Chinese translation dictionary based on EDR Japanese-English bilingual dictionary. pp.551-557 [PDF, 323KB]

Simon Zwarts and Mark Dras: Syntax-based word reordering in phrase-based statistical machine translation: why does it work? pp.559-566 [PDF, 98KB]

Panels

13 September: Machine translation in use (moderator: Ed Hovy)

14 September: The prospects of MT today (moderator Hans Uszkoreit)

Tutorials, 10 September

Kevin Knight and Philipp Koehn: Statistical machine translation; abstract, 1p. [PDF, 106KB]; part 1 by Philipp Koehn, 128pp. [PDF of PPT presentation, 601KB]; part 2 by Kevin Knight, 180pp. [PDF of PPT presentation, 2292KB]

Cristina Vertan: Example based machine translation for cross-lingual information retrieval; abstract, 1p. [PDF, 83KB]; presentation, 112pp. [PDF of PPT slides, 6215KB]

Maghi King, Andrei Popescu-Belis, and Paula Estrella: Context-based evaluation of MT systems: principles and tools; abstract, 1p. [PDF, 78KB]; outline and materials, 7pp. [PDF, 80KB]; presentation, 59pp. [PDF of PPT slides, 6897KB]

Federico Gaspari and Harold Somers: Using free online MT in multilingual websites; abstract, 1p. [PDF, 75KB]

Adriane Rinsche: LTC tutorial on workflow and business information management in the language industry; abstract, 2pp. [PDF, 30KB]; materials, 10pp. [PDF, 83KB]

Jen Doyon: How to successfully integrate MT; abstract, 1p. [PDF, 88KB]

Workshops, 11 September

Patent translation

Organized by Jun’ichi Tsujii and Shoichi Yokoyama: (eds.): complete proceedings 38pp. [PDF, 1048KB]

Rachel Chrem: WIPO’s activities in patent translation and terminology. Invited talk. 42pp. [PDF of PPT presentation, 2099KB]

Oh-Woog Kwon, Sung-Kwon Choi, Ki-Young Lee, Yoon-Hyung Roh, Young-Gil Kim, and Munpyo Hong : English-Korean patent system: fromTo-EK/PAT; pp.1-7. [PDF, 103KB]

Akira Ushioda: Phrase alignment for integration of SMT and RBMT resources; pp.8-12. [PDF, 66KB]

Ehara Terumasa: Rule based machine translation combined with statistical post editor for Japanese to English patent translation; pp.13-18. [PDF, 355KB]

Lene Offersgaard and Claus Povlsen: Patent documentation – comparison of two MT strategies; pp.19-23. [PDF, 60KB]

Yokoyama Shoichi and Kennendai Shigehiro: Error correcting system for analysis of Japanese patent sentences; pp.24-27. [PDF, 73KB]

Svetlana Sheremetyeva: On portability of resources for a quick ramp up of multilingual MT of patent claims; pp.28-33. [PDF, 66KB]

The Chinese room experiment

Organized by John S.White and Florence Reeder: abstract, 1p. [PDF, 13KB]

Using corpora for natural language generation (UCNLG+MT)

Organized by Anja Belz and Sebastian Varges (eds.): preliminary matter and contents [PDF, 655KB]

Kevin Knight: Automatic language translation generation help needs badly; pp.1-4 [PDF, 1093KB]

Yvette Graham, Deirdre Hogan, and Josef van Genabith: Automatic evaluation of generation and parsing for machine translation with automatically acquired transfer rules; pp.5-12 [PDF, 1796KB]

David Hardcastle: Generalizing syntactic collocates for creative language generation; pp.13-21 [PDF, 2114KB]

Michael White, Rajakrishnan Rajkumar, and Scott Martin: Towards broad coverage surface realization with CCG; pp.22-30 [PDF, 2020KB]

Keiji Yasuda, Hirofumi Yamamoto, and Eiichiro Sumita: Method of selecting training sets to build compact and efficient language model; pp.31-37 [PDF, 1632KB]

Bernd Bohnet: The induction and evaluation of word order rules using corpora based on the two concepts of topological models; pp.38-45 [PDF, 1835KB]

Olivier Gouirand: A probabilistic approach to linguistic analysis in machine translation output evaluation; pp.46-54 [PDF, 2182KB]

Irene Langkilde-Geary: Declarative syntactic processing of natural language using concurrent constraint programming and probabilistic dependency modeling; pp.55-63 [PDF, 1969KB]

Nizar Habash: NLG is still relevant to MT; pp.64-65 [PDF, 478KB]

Andrei Popescu-Belis: Evaluation of NLG: some analogies and differences with machine translation and reference resolution; pp.66-68 [PDF, 683KB]

Gregor Thurmair: Generation issues in machine translation; pp.69-70 [PDF, 466KB]

Sebastian Varges: One-way translation: an opportunity for NLG and MT research to interact; pp.71-72 [PDF, 497KB]

Anja Belz, Albert Gatt, Ehud Reiter and Jette Viethen: The attribute selection for generation of referring expressions challenge. [Introduction to Shared Task Evaluation Challenge.] pp.73-74 [PDF, 438KB]

Anja Belz and Albert Gatt: The attribute selection for GRE challenge: overview and evaluation results; pp.75-83 [PDF, 438KB]

Bernd Bohnet: IS-FBN, IS-FBS, IS-IAC: the adaptation of two classic algorithms for the generation of referring expresssions in order to produce expressions like humans do; pp.84-86 [PDF, 676KB]

Raquel Hervás and Pablo Gervás: NIL: attribute selection for matching the task corpus using relative attribute groupings obtained from the test data; pp.87-89 [PDF, 694KB]

J.D.Kelleher: DIT: frequency based incremental attribute selection for GRE; pp.90-91 [PDF, 502KB]

Advaith Siddharthan and Ann Copestake: Evaluating an open-domain GRE algorithm on closed domains system IDs: CAM-B, CAM-T, CAM-BU and CAM-TU; pp.92-94 [PDF, 727KB]

Mariët Theune, Pascal Touset, Jette Viethen, and Emiel Krahmer: Cost-based attribute selection for GRE (GRAPH-SC/GRAPH-FP); pp.95-97 [PDF, 721KB]

Philipp Spanger, Kurosawa Takahiro, and Tokunaga Takenobu: TITCH: attribute selection based on discrimination power and frequency; pp.98-100 [PDF, 688KB]

Kees van Deemter and Albert Gatt: Content determination in GRE: evaluating the evaluator; pp.101-103 [PDF, 754KB]

Automatic procedures in MT evaluation

Organized by Gregor Thurmair, Khalid Choukri, and Bente Maegaard: programme, 2pp. [PDF, 112KB];

Andrei Popescu-Belis: The place of automatic evaluation metrics in external quality models for machine translation. 19pp. [PDF of PPT presentation, 109KB]

Philipp Koehn and Chris Callison-Burch: Evaluating evaluation – lessons from the WMT 2007 shared task. 38pp. [PDF of PPT presentation, 425KB]

Eduard Hovy: Investigating why BLEU penalizes non-statistical systems. 10pp. [PDF of PPT presentation, 265KB]

Christopher Cieri, Stephanie Strassel, Meghan Lammie Glenn, and Lauren Friedman: Linguistic resources in support of various evaluation metrics. 34pp. [PDF of PPT presentation, 1007KB]

Olivier Hamon: Experiences and conclusions from the CESTA evaluation project. 22pp. [PDF of PPT presentation, 108KB]

Gregor Thurmair: Automatic evaluation in MT system production. 28pp. [PDF of PPT presentation, 153KB]

Bogdan Babych and Anthony Hartley: Sensitivity of automated models for MT evaluation: proximity-based vs. performance-based methods. 22pp. [PDF of PPT presentation, 150KB]

Khalid Choukri, Olivier Hamon, and Djamel Mostefa: MT evaluation & TC-STAR. 33pp. [PDF of PPT presentation, 184KB]