Proceedings of the International Conference on Methodology and Techniques of Machine Translation: Processing from words to language
Today Machine Translation (MT) systems are at best unique combinations of mathematical, linguistic and algorithmic theories, and of the absence of any theory of translation. In most instances, be it or not because of the complexity of the theories and models involved, managers and translators have been kept out, or kept themselves out, of MTsystems design and development. However, they are the ones who have to use and manage such systems (if they ever become operational), cope with their development and operational costs and, with the help of such strange tools, achieve objectives of better communication. Clearly, since designers and users of operational MT systems are quite separate groups, it is not less than a transfer of technology that must occur for managers and translators, who are MT-wise developing professionals, to inherit the so-called achievements of developed computational linguistics theories. Most of those technology transfer problems resemble the ones managers are faced with when a new computerized information system is implemented in its operational and user environment: system and acceptance testing, possible strategies of implementations, conversion from old (manual) to new system, training, resistance to change, operation per se, including file and data-base maintenance, on-going evaluation and improvement of the system, etc. The paper will briefly overview these problems as they arise in an MT environment. Problems are interesting, but solutions are even more so. With examples mainly from the North-American experience, the paper will discuss original strategies that render easier the access of users to MT technology: early involvement of users in the development process, incorporation into existing operation environment, incorporation into a total document design and production system, total service by a translation firm making the system fully transparent to end-users, layered software structure, micro-computer implementation, direct connection and use through existing computer networks, and more ideas that will have emerged or been implemented by the time of the Conference.
In studying machine translation software design, computer experts and linguists have traditionally concentrated on a number of phenomena deemed to present special problems and thus require particular attention. Among the favourites in this connection are morphological analysis, prepositional dependencies and the establishment of antecedents. These and similar subjects have been dealt with at great length in the numerous papers written over the years to demonstrate the necessity of adding one or more specific processing features to the software under design or pilot development. Experience in the practical upgrading of operational systems has however tended to reveal a surprising variety of quite different problems and has shown that the fears of designers and theorists are frequently unfounded. Indeed, in tailoring a system for use by translators, many quite unexpected types of error emerge which, in the absence of sufficiently comprehensive studies, have to be eliminated largely on the basis of trial and error. The paper presents several examples of translation problems of this type and explains how difficult it can be to formalize their resolution in computer programs. Special reference is made to the English-French version of Systran, under development at the European Commission in Luxembourg. Explanations are given of the identification of error types, the human effort involved in their study, and the testing procedures used to check the validity of the action taken to reduce their occurrence in routine translation work. Finally, a number of suggestions are made for those working on design aspects of new systems in the hope that by paying lessattention to problems which have already been solved, efforts can be concentrated on the specific areas which continue to cause frustration for those required to correct or use machine translations in practice.
The context of this paper is that of a translator wishing to develop dictionaries for the purposes of machine-aided translation (MAT). A description is given of the ways in which lexical items in running text are statistically "patterned", depending on whether these so-called "types" are left unaltered as they are extracted from the text or whether they are immediately mapped onto the corresponding dictionary look-up form ("lemma") for the purpose of statistical analysis. It is obvious, of course, that for translation purposes it is necessary to establish appropriate entry-points into the MAT dictionary, but this is a secondary problem. There are two dimensions which can assist the machine-assisted translator to a considerable extent. One such factor is any degree of homogeneity the greater, the better in the texts he wishes to process. Translators specialising in certain subject areas and types of discourse are at an advantage if they wish to use an MAT system. The second factor is that of the so-called "multi-word unit". Although all languages have multi-word units, which are semantically atomic, they are particularly important in English, and even more so in English technical terminology. Frequency studies of multi-word units, although they generate large listings of types, can be very useful for MAT. The machine-assisted translator is faced with the need to view his work as consisting of two distinct modes: dictionary elaboration and text transaction. The second mode, of course, provides important feed-back to guide the first. One thing is clear: the translator must be his own lexicographer to a great extent, at least until the time when software houses realise the commercial value of such "static" data as general bi-lingual high-frequency dictionaries ana the potential "constellation" of carefully designed and delineated bi-lingual glossaries of technical terminology!
The importance and structure of MT-dictionary were discussed extensively by many researchers in machine, translation in the past. These structures were mainly concerned with MT-dictionaries for one-way translation systems. In the present paper, a new dictionary structure for bi-directional machine translation is being introduced. The new structure is being tested for Chinese-English as well as English-Chinese machine translation.
A new software system for describing a grammar of a machine translation system has been developed. This software system is called GRADE (GRAmmar DEscriber). GRADE has the following features: 1. GRADE allows a grammar writer to divide a whole grammar into several parts. Each part of the grammar is called a subgrammar. A subgrammar describes a step of the translation process. A whole grammar is then described by a network of sub-grammars. This network is called a subgrammar network. A subgrammar network allows a grammar writer to control the process of the translation precisely. When a subgrammar network in the analysis phase consists of a subgrammar for a noun-phrase (SG1) and a subgrammar for a verb-phase (SG2) in this sequence, the subgrammar network first applies SG1 to an input sentence, then applies SG2 to the result of an application of SG1, thus getting a syntactic structure for the input sentence. 2. A subgrammar consists of a set of rewriting rules. Rewriting rules in a subgrammar are applied for an input sentence in an appropriate order, which is specified in the description of the subgrammar. A rewriting rule transforms a tree structure into another tree structure. Rewriting rules use a powerful pattern matching algorithm to test their applicability to a tree structure. For example, a grammar writer can write a pattern that recognizes and parses an arbitrary numbers of sub-trees. Each node of a tree-structure has a list of pairs of a property name and a property value. A node can express a category name, a semantic marker, flags to control the translation process, and various other information. This tree-to-tree transformation operation by GRADE allows a grammar writer to describe all the processes of analysis, transfer and generation of a machine translation system with this uniform description capability of GRADE. 3. A subgrammar network or a subgrammar can be written in an entry of the dictionaries for a machine translation system. A subgrammar network or a subgrammar written in a dictionary entry is called a dictionary rule, which is specific for a word. When an input sentence contains a word which has a dictionary rule, it is applied to an input sentence at an appropriate point of a translation process. It can express more precise processing appropriate for that specific word that a general Subgrammar Network or Subgrammar. it also allows grammar writers to adjust a machine translation system to a specific domain easily. 4. GRADE is written in LISP. GRADE is implemented on FACOM M-382 and Symbolics 3600. GRADE is used in the machine translation system between Japanese and English. The project was started by the Japanese government in 1982. The effectiveness of GRADE has been demonstrated in the project.
In this paper a procedure for the production of sentences is described, producing written sentences in a particular language starting from formal representations of their meaning. After a brief description of the internal representation used, the algorithm is presented, and some results and future trends are discussed.
The paper describes the CASSEX package, a parser which takes as input English sentences and produces semantic representations of them, and gives an account of the generation procedure which translates these semantic representations into Chinese sentences.
The standard design for a computer-assisted translation system consists of data entry of source text, machine translation, and post editing (i.e. revision) of raw machine translation. This paper discusses this standard design and presents an alternative three-level design consisting of word processing integrated with terminology aids, simple source text processing, and a link to an off-line machine translation system. Advantages of the new design are discussed.
This paper outlines the mutual beneficial analogies between the structural dynamics of memory and machine translation, both of which are extensively dependent on fundamental pattern recognition problems. Basically, both processes are faced with a similarly structured problem namely, the problem of condensing large quantities of data into intelligently interpretable smaller volumes (comprised of basic "information clusters"). For machine translation, the alphabets and words of a language (that make up an essay) define these data, while the multiplicities of physico-chemical objects of sensory perception constitute, amongst others, the data compression problem facing the memory functions of the brain. For the neural systems (underlying the memory functions of the brain) recent advancements in generalized quantum theoretical methods provide some bases. While these foundations will not be discussed here in any detail, they are used to define the components of a language compatible with memory dynamics. Essentially, these culminate in associative (quantum) logical problems with analogical counterparts in linguistics and the use of compartmentalization cum associative logic in essay interpretations. For purposes of computational linguistics, this paper makes these analogies precise (on quantitative analytical basis), with emphasis on discrete recursive generation of larger structures, and equivalents of coding and decoding for machine translation process.
Four years ago the Nuclear Center of Karlsruhe has commenced to apply the Systran MT Program for the translation of nuclear technology texts from French into English. During this period the Systran program has been updated several times and about 8000 entries have been made in the stem dictionary to adapt the MT program to the special field. This resulted in substantial improvement of the quality of translated texts. Quantitative judgement of this quality could be achieved by repeated statistical analysis of some representative sample texts. The results of these analyses are shown and commented.
We attempt to develop a general theory of robust processing for natural language, and especially Machine Translation purposes. That is, a general characterization of methods by which processes can be made resistant to malfunctioning of various kinds. We distinguish three sources of malfunction: (a) deviant inputs, (b) deviant outputs, and (c) deviant pairings of input and output, and describe the assumptions that guide our discussion (sections 1 and 2). We classify existing approaches to (a)and (b)-robustness, noting that not only do such approaches fail to provide a solution to (c)-type problems, but that the natural consequence of these solutions is to make (c)-type malfunctions harder to detect (section 3) In the final section (4) we outline possible solutions to (c)-type malfunctions.
The MT system SUSY-E which has been developed since 1972 in the Sonderforschungsbereich "Elektronische Sprachforschung" of the University of the Saar can be divided into three major subsystems: background, dictionary and kernel systems. The background system represents the interface to implementers, linguists and users. The dictionary system supports the construction and maintenance of the different dictionaries and provides the description of the dictionary entries. The proper translation processes are carried out by the use of the kernel systems containing the linguistic knowledge in different representational schemes and allowing for syntactico-semantic analysis and generation of texts. The most elaborate kernel system of SUSY-E is SUSY which has been constantly developed and tested in the past ten years. Apart from SUSY there exist several new "prototypes" which in their architecture show considerable differences between themselves and especially with regard to SUSY. These new approaches are called SUSY-II systems.