Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program
This paper describes the role of machine translation (MT) for multilingual information access, a service that is desired by digital libraries that wish to provide cross-cultural access to their collections. To understand the performance of MT, we have developed HeMT: an integrated multilingual evaluation platform (http://txcdk-v10.unt.edu/HeMT/) to facilitate human evaluation of machine translation. The results of human evaluation using HeMT on three online MT services are reported. Challenges and benefits of crowdsourcing and collaboration based on our experience are discussed. Additionally, we present the analysis of the translation errors and propose Multi-engine MT strategies to improve translation performance.
How to achieve the optimal balance of quality and cost when the need for translation is sky-rocketing? Can machine translation be the solution? What system to choose? Finding the right MT solution for your organization is not easy. In this paper, we would like to share our experience at Nikon Precision Inc. in quest of the right tool, focusing on rule-based Japanese MT software and the results of a small pilot project, together with our plans for the future and the challenges we are facing.
Despite the growth of statistical machine translation (SMT) research and development in recent years, it remains somewhat out of reach for the translation community where programming expertise and knowledge of statistics tend not to be commonplace. While the concept of SMT is relatively straightforward, its implementation in functioning systems remains difficult for most, regardless of expertise. More recently, however, developments such as SmartMATE have emerged which aim to assist users in creating their own customized SMT systems and thus reduce the learning curve associated with SMT. In addition to commercial uses, translator training stands to benefit from such increased levels of inclusion and access to state-of-the-art approaches to MT. In this paper we draw on experience in developing and evaluating a new syllabus in SMT for a cohort of post-graduate student translators: we identify several issues encountered in the introduction of student translators to SMT, and report on data derived from repeated measures questionnaires that aim to capture data on students’ self-efficacy in the use of SMT. Overall, results show that participants report significant increases in their levels of confidence and knowledge of MT in general, and of SMT in particular. Additional benefits – such as increased technical competence and confidence – and future refinements are also discussed.
This paper reports on a project whose aims are to investigate the usability of raw machine translated technical support documentation for a commercial online file storage service. Following the ISO/TR 16982 definition of usability - goal completion, satisfaction, effectiveness, and efficiency - comparisons are drawn for all measures between the original user documentation written in English for a well-known online file storage service and raw machine translated output in four target languages: Spanish, French, German and Japanese. Using native speakers for each language, we found significant differences between the source and MT output for three out of the four measures: goal completion, efficiency and user satisfaction. This leads to a tentative conclusion that there is a difference in usability between well-formed content and raw machine translated content, and we suggest avenues for further work.
All types of Machine Translation technologies have pros and cons. At PayPal, we have been working with MT for 3 years (2 of them in a production environment). The aim of this paper is to share our experience and discuss strengths and weaknesses for Rule-based Machine Translation, Statistical Machine Translation and Hybrid Machine Translation. We will also share pointers for successful implementation of any of these technologies.
Statistical Machine Translation (SMT) systems specialized for one domain often perform poorly when applied to other domains. Domain adaptation techniques allow SMT models trained from a source domain with abundant data to accommodate different target domains with limited data. This paper evaluates the performance of two adaptive techniques based on log-linear and mixture models on data from the legal domain in real-world settings. Performance evaluation includes post-editing time and effort required by a professional post-editor to improve the quality of machine-generated translations to meet industry standards, as well as traditional automated scoring techniques (BLEU scores). Results indicates that the domain adaptation techniques can yield a significant increase in BLEU score (up to three points) and a significant reduction in post-editing time of about one second per word in an operational environment.
Machine translation resurfaced as a viable business solution about 5 years ago, with much hype. With the amount of content requiring translation, and a mellowing of user expectations about translation quality, it seemed there was real business value in developing machine translation solutions. Since then, however, the discounts offered to enterprise customers have remained stubbornly meager in the 10-20% range, with high, up-front costs—far from the anticipated savings. This paper provides an overview of the challenges encountered in the value chain between customer and Language Service Provider (LSP) which keep translation costs high and limit machine translation adoption, discusses existing and potential solutions to these challenges, and offers suggestions on how to enlist the support of the LSP and freelance translator community to address these challenges.
This paper presents a case-study of work done by Applied Language Solutions (ALS) for a large social networking provider who claim to have built the world’s first multi-language social network, where Internet users from all over the world can communicate in languages that are available in the system. In an initial phase, the social networking provider contracted ALS to build Machine Translation (MT) engines for twelve language-pairs: Russian⇔English, Russian⇔Turkish, Russian⇔Arabic, Turkish⇔English, Turkish⇔Arabic and Arabic⇔English. All of the input data is user-generated content, so we faced a number of problems in building large-scale, robust, high-quality engines. Primarily, much of the source-language data is of ‘poor’ or at least ‘non-standard’ quality. This comes in many forms: (i) content produced by non-native speakers, (ii) content produced by native speakers containing non-deliberate typos, or (iii) content produced by native speakers which deliberately departs from spelling norms to bring about some linguistic effect. Accordingly, in addition to the ‘regular’ pre-processing techniques used in the building of our statistical MT systems, we needed to develop routines to deal with all these scenarios. In this paper, we describe how we handle shortforms, acronyms, typos, punctuation errors, non-dictionary slang, wordplay, censor avoidance and emoticons. We demonstrate automatic evaluation scores on the social network data, together with insights from the the social networking provider regarding some of the typical errors made by the MT engines, and how we managed to correct these in the engines.
Managing large scale MT post-editing projects is a challenging endeavor. From securing linguists buy-in to ensuring consistency of the output, it is important to develop a set of specific processes and tools that facilitate this task. Drawing from years of experience in such projects, we will attempt here to describe the challenges associated to the management of such projects and to define best practices.
This document introduces the strategy implemented at CA Technologies to exploit Machine Translation (MT) at the corporate-wide level. We will introduce the different approaches followed to further improve the quality of the output of the machine translation engine once the engines have reached a maximum level of customization. Senior team support, clear communication between the parties involved and improvement measurement are the key components for the success of the initiative.
In this paper, we present SAIC’s hybrid machine translation (MT) system and show how it was adapted to the needs of our customer – a major global fashion company. The adaptation was performed in two ways: off-line selection of domain-relevant parallel and monolingual data from a background database, as well as on-line incremental adaptation with customer parallel and translation memory data. The translation memory was integrated into the statistical search using two novel features. We show that these features can be used to produce nearly perfect translations of data that fully or to a large extent partially matches the TM entries, without sacrificing on the translation quality of the data without TM matches. We also describe how the human post-editing effort was reduced due to significantly better MT quality after adaptation, but also due to improved formatting and readability of the MT output.
Machine Translation (MT) is said to be the next lingua franca. With the evolution of new technologies and the capacity to produce a humungous number of written digital documents, human translators will not be able to translate documentation fast enough. However, some applications require a level of quality that is still beyond that provided by MT. Thanks to the increased capacity of communication provided by new technologies, people can also interact and collaborate to work remotely. With this, crowd computing is becoming more common and it has been proposed as a feasible solution for translation. In this paper, we discuss about the relationship between crowdsourcing and MT, and the main challenges for the MT community to multiply the potential of the crowd.
Ford Motor Company is at the forefront of the global economy and with this comes the need for communicating with regional manufacturing staff and plant employees in their own languages. Asian employees, in particular, do not necessarily learn English as a second language as is often the case in European countries, so manufacturing systems are now mandated to support local languages. This support is required for plant floor system applications where static data (labels, menus, and messages) as well as dynamic data (user entered controlled and free text) is required to be translated from/to English and the local languages. This facilitates commonization of business methods where best practices can be shared globally between plant and staff members. In this paper and presentation, we will describe our experiences in bringing Machine Translation technology to a large multinational corporation such as Ford and discuss the lessons we learned as well as both the successes and failures we have experienced.
The Church of Jesus Christ of Latter-day Saints undertook an extensive effort at the beginning of this year to deploy machine translation (MT) in the translation workflow for the content on its principal website, www.lds.org. The objective of this effort is to reduce by at least 50% the time required by human translators to translate English content into nine other languages and publish it on this site. This paper documents the experience to date, including selection of the MT system, preparation and use of data to customize the system, initial deployment of the system in the Church’s translation workflow, post-editing training for translators, the resulting productivity improvements, and plans for future deployments.
We describe a system to rapidly generate high-quality closed captions and subtitles for live broadcasted TV shows, using automated components, namely Automatic Speech Recognition and Machine Translation. The human stays in the loop for quality assurance and optional post-editing. We also describe how the system feeds the human edits and corrections back into the different components for improvement of these components and with that of the overall system. We finally describe the operation of this system in a real life environment within a broadcast network, where we implemented the system to transcribe, process broadcast transmissions and generate high-quality closed captions in Arabic and translate these into English subtitles in short time.
This paper reports on three business opportunities encountered by Spoken Translation, Inc., a developer of software systems for automatic spoken translation: (1) a healthcare organization needing improved communications between limited-English patients and their caregivers; (2) a networking and communications firm aiming to add UN-style simultaneous interpreting to their telepresence facilities; and (3) the retail arm of a device manufacturer hoping to enable more effective in-store consulting for customers with imperfect command of an outlet's native language. None of these openings has yet led to substantial business, but one remains in negotiation. We describe how the business introductions came to us; the proposed use cases; demonstrations, presentations, tests, etc.; and issues/challenges. We also comment on early consumer-oriented products for spoken language translation. The aim is to provide a snapshot of one company's business possibilities and challenges at the dawn of the era of automatic interpreting.
Intellectual Property professionals frequently need to carry out patent searches for a variety of reasons. During a typical search, they will retrieve approximately 30% of their results in a foreign language. The machine translation (MT) options currently available to patent searchers for these foreign-language patents vary in their quality, consistency, and general level of service. In this article, we introduce IPTranslator; an MT web service designed to cater for the needs of patent searchers. At the core of IPTranslator is a set of MT systems developed specifically for translating patent text. We describe the challenges faced in adapting MT technology to such a complex domain, and how the systems were evaluated to ensure that the quality was fit for purpose. Finally, we present the framework through which the IPTranslator service is delivered to users, and the value-adding features which address many of the issues with existing solutions.