Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Tutorials
This tutorial is for people who are beginning to evaluate how well machine translation will fit their needs or who are curious to know more about how it is used. We assume no previous knowledge of machine translation. We focus on background knowledge that will help you both get more out of the rest of AMTA2010 and to make better decisions about how to invest in machine translation. Past participants have ranged from tech writers and freelance translators who want to keep up to date to VPs and CEOs who are evaluating technology strategies for their organizations. The main topics for discussion are common FAQs about MT (Can machines really translate? Can we fire our translators now?) and limitations (Why is the output so bad? What is MT good for?), workflow (Why buy MT if it’s free on the internet? What other kinds of translation automation are there? How do we use it?), return on investment (How much does MT cost? How can we convince our bosses to buy MT?), and steps to deployment (Which MT system should we buy? What do we do next?).
This session will cover how to increase localization efficiency with a SYSTRAN desktop product and a server solution. First we will demonstrate how to integrate MT in a localization workflow, interaction with TM matching tools, hands-on MT customization using various tools and dictionaries, and final post-edition using SYSTRAN Premium Translator, a desktop product. We will also walk through the complete cycle of automatic quality improvement using SYSTRAN Training Server, part of the Enterprise Server 7 suite. It covers managing bilingual and monolingual data using Corpus Manager, training hybrid or statistical translation models with Training Manager, and evaluating quality using automatic scoring and side-by-side translation comparison. It also includes other useful tools that automatically extract and validate dictionary entries, and create TMs from unaligned bilingual sentences automatically. Finally, localization efficiency with or without MT integration/customization is compared with the actual cost benefits.
Arabic poses many interesting challenges to machine translation: ambiguous orthography, rich morphology, complex morpho-syntactic behavior, and numerous dialects. In this tutorial, we introduce the most important themes of challenges and solutions for people working on translation from/to Arabic or any of its dialects. The tutorial is intended for researchers and developers working on MT. The discussion of linguistic issues and how they are addressed in MT will help linguists and professional translators understand the issues machine translation faces when dealing with Arabic and other morphologically rich languages. The tutorial does not expect the attendees to be able to speak/read/write Arabic.
This tutorial will present a survey of how machine translation is integrated into current CAT tools and illustrate how the technology can be used appropriately and profitably by the professional translator.
If you are interested in open-source machine translation but lack hands-on experience, this is the tutorial for you! We will start with background knowledge of statistical machine translation and then walk you through the process of installing and running an SMT system. We will show you how to prepare input data, and the most efficient way to train and use your translation systems. We shall also discuss solutions to some of the most common issues that face LSPs when using SMT, including how to tailor systems to specific clients, preserving document layout and formatting, and efficient ways of incorporating new translation memories. Previous years’ participants have included software engineers and managers who need to have a detailed understanding of the SMT process. This is a fast-paced, hands-on tutorial that will cover the skills you need to get you up and running with open-source SMT. The teaching will be based on the Moses toolkit, the most popular open-source machine translation software currently available. No prior knowledge of MT is necessary, only an interest in it. A laptop is required for this tutorial, and you should have rudimentary knowledge of using the command line on Windows or Linux.
Several studies have recently reported significant productivity gains by human translators when besides translation memory (TM) matches they do also receive suggestions from a statistical machine translation (SMT) engine. In fact, an increasing number of language service providers and in-house translation services of large companies is nowadays integrating SMT in their workflow. The technology transfer of state-of-the-art SMT technology from research to industry has been relatively fast and simple also thanks to development of open source software, such as MOSES, GIZA++, and IRSTLM. While a translator is working on a specific translation project, she evaluates the utility of translating versus post-editing a segment based on the adequacy and fluency provided by the SMT engine, which in turn depends on the considered language pair, linguistic domain of the task, and the amount of available training data. Statistical models, like those employed in SMT, rely on a simple assumption: data used to train and tune the models represent the target translation task. Unfortunately, this assumption cannot be satisfied for most of the real application cases, simply because for most of the language pairs and domains there is no sufficient data to adequately train an SMT system. Hence, common practice is to train SMT systems by merging together parallel and monolingual data from the target domain with as much as possible data from any other available source. This workaround is simple and gives practical benefits but is often not the best way to exploit the available data. This tutorial copes with the optimal use of in-domain and out-of-domain data to achieve better SMT performance on a given application domain. Domain adaptation, in general, refers to statistical modeling and machine learning techniques that try to cope with the unavoidable mismatch between training and task data that typically occurs in real life applications. Our tutorial will survey several application cases in which domain adaptation can be applied, and presents adaptation techniques that best fit each case. In particular, we will cover adaptation methods for n-gram language models and translation models in phrase-based SMT. The tutorial will provide some high-level theoretical background in domain adaptation, it will discuss practical application cases, and finally show how the presented methods can be applied with two widely used software tools: Moses and IRSTLM. The tutorial is suited for any practitioner of statistical machine translation. No particular programming or mathematical background is required.