Post-editing (PE) is a necessary process in every MT deployment environment. The competences needed for PE are traditionally seen as a subset of a human translator's competence. Meanwhile, some companies are accepting that the PE process involves self-standing linguistic tasks, which need their own training efforts and appropriate software tool support. To date, we still lack recorded qualitatively and quantitatively PE user-activity data that adequately describe the tasks and in particular the human cognitive processes accomplished. This data is needed to effectively model, design and implement supportive software systems which, on the one hand, efficiently guide the human post-editor and enhance her cognitive capabilities, and on the other hand, have a certain influence on the translation performance and competence of the employed MT system. In this paper we argue for a framework of practices to describe the PE process by correlating data obtained in laboratory experiments and augmented by additional data from different resources such as interviews and mathematical prediction models with the tasks fulfilled, and to model the identified process in a multi-facetted fashion as a basis for the implementation of a human PE-aware interactive software system.
Corpus-based MT systems that analyse and generalise texts beyond the surface forms of words require generation tools to re-generate the various internal representations into valid target language (TL) sentences. While the generation of word-forms from lemmas is probably the last step in every text generation process at its very bottom end, token-generation cannot be accomplished without structural and morpho-syntactic knowledge of the sentence to be generated. As in many other MT models, this knowledge is composed of a target language model and a bag of information transferred from the source language. In this paper we establish an abstracted, linguistically informed, target language model. We use a tagger, a lemmatiser and a parser to infer a template grammar from the TL corpus. Given a linguistically informed TL model, the aim is to see what need be provided from the transfer module for generation. During computation of the template grammar, we simultaneously build up for each TL sentence the content of the bag such that the sentence can be deterministically reproduced. In this way we control the completeness of the approach and will have an idea of what pieces of information we need to code in the TL bag.
In this paper, organized in essay style, I first assess the situation of Machine Translation, which is characterized, on the one hand, by unsatisfied user expectations, and, on the other hand, by an ever increasing need for translation technology to fulfil the promises of the global knowledge society, which is promoted by almost all governments and industries worldwide. The assessment is followed by an outline of the design of a blueprint that describes possible steps of an MT evolution regarding short term, mid term and long term developments. Although some user communities might aim at an MT revolution, the evolutionary implementation of the different aspects of the blueprint fit seamless with the foundation that we are faced with in the assessment part. With the blueprint the thesis of this MT evolution essay is established, and the stage is opened for the antithesis in which I develop the points for an MT revolution. Finally, in the synthesis part I develop a combined view which then completes the discussion and the establishment of a blueprint for MT evolution.
This paper provides a nutshell description of how the recently published proposal of a translation quality metric for automotive service information is applicable in an evaluation scenario that deploys multilingual human language technology (mHLT). This proposal is the result of the J2450 task force group of the Society of Automotive Engineers (SAE). The main focus of the developed metric is on the syntactic level of a translation product. Since it is our belief that any evaluation of a translation (human and machine) should also take into account the semantic level of a human language product, we have slightly reshaped the SAE J2450 metric. In addition, we have embedded the whole evaluation process into an object-oriented quality model approach to account for the established business processes in the acquisition, production, translation and dissemination of automotive service information in SGML/XML environments. This scenario then provides the solid grounding for the setup of a quality assurance process for all dimensions related to the processing (human and machine) of automotive service information. The work reported here is one part of the ongoing European Multidoc project that has brought together several European automotive companies to taming the complexity of service information products in an integrated way. Within Multidoc integration means first and foremost the coupling of advanced information technology and mHLT. These aspects will be further motivated and detailed in the context of the specification of an evaluation scenario.
Machine Translation Supported by Terminological Information
Jörg Schütz | Bärbel Ripplinger
Proceedings of the Fifth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages
This paper outlines a new architecture for a NLP/MT development environment for the EUROTRA project, which will be fully operational in the 1993-94 time frame. The proposed architecture provides a powerful and flexible platform for extensions and enhancements to the existing EUROTRA translation philosophy and the linguistic work done so far, thus allow- ing the reusability of existing grammatical and lexical resources, while ensuring the suitability of EUROTRA methods and tools for other NLP/MT system developers and researchers.