Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: System Descriptions
John S. White (Editor)
As is known, the majority of the actual textual content on the Internet is in English language. This represents an obstacle to those non-English speaking users willing to access the Internet. The idea behind this MT-based application is to allow any Arabic user to search and navigate through the Internet using Arabic language without the need to have prior knowledge of English language. The infrastructure of TARJIM.COM relies on 3 basic core components : 1- The Bi-directional English-Arabic Machine translation Engine, 2- The intelligent Web page layout preserving component and 3-The Search Engine query interceptor.
In this paper we describe the KANTOO machine translation environment, a set of software services and tools for multilingual document production. KANTOO includes modules for source language analysis, target language generation, source terminology management, target terminology management, and knowledge source development. The KANTOOsystem represents a complete re-design and re-implementation of the KANT machine translation system.
ARL’s FALCon system has proven its integrated OCR and MT technology to be a valuable asset to soldiers in the field in both Bosnia and Haiti. Now it is being extended to include six more SYSTRAN language pairs in response to the military’s need for automatic translation capabilities as they pursue US national objectives in East Asia. The Pacific Rim Portable Translator will provide robust automatic translation bidirectionally for English, Chinese, Japanese, and Korean, which will allow not only rapid assimilation of foreign information, but two-way communication as well for both the public and private sectors.
The LabelTool/TrTool system is designed to administer text strings that are shown in devices with a very limited display area and translated into a very large number of foreign languages. Automation of character set handling and file naming and storage together with real–time simulation of text string input are the main features of this application.
The LogoVista ES translation system translates English text to Spanish. It is a member of LEC’s family of translation tools and uses the same engine as LogoVista EJ. This engine, which has been under development for ten years, is heavily linguistic and rule-based. It includes a very large, highly annotated English dictionary that contains detailed syntactic, semantic and domain information; a binary parser that produces multiple parses for each sentence; a 12,000+-rule, context-free English grammar; and a synthesis file of rules that convert each parsed English structure into a Spanish structure. The main tasks involved in developing a new language pair include the addition of target-language translations to the dictionary and the addition of rules to the synthesis file. The system’s modular design allows the work to be carried out by linguists, independent of engineers.
One of the most important components of any machine translation system is the translation lexicon. The size and quality of the lexicon, as well as the coverage of the lexicon for a particular use, greatly influence the applicability of machine translation for a user. The high cost of lexicon development limits the extent to which even mature machine translation vendors can expand and specialize their lexicons, and frequently prevents users from building extensive lexicons at all. To address the high cost of lexicography for machine translation, L&H is building a Lexicography Toolkit that includes tools that can significantly improve the process of creating custom lexicons. The toolkit is based on the concept of using automatic methods of data acquisition, using text corpora, to generate lexicon entries. Of course, lexicon entries must be accurate, so the work of the toolkit must be checked by human experts at several stages. However, this checking mostly consists of removing erroneous results, rather than adding data and entire entries. This article will explore how the Lexicography Toolkit would be used to create a lexicon that is specific to the user’s domain.
This paper describes some of the features of the new 32-bit Windows version of PAHO’s English-Spanish (ENGSPAN®) and Spanish-English (SPANAM®) machine translation software. The new dictionary update interface is designed to help users add their own terminology to the lexicon and encourage them to write context-sensitive rules to improve the quality of the output. Expanded search capabilities provide instant access to related source and target entries, expressions, and rules. A live system demonstration will accompany this presentation.