INTRODUCTION

to the workshop on Example-Based Machine Translation

This volume contains the papers for presentation at the Workshop on Example Based Machine Translation, which is part of the MT Summit VIII held on September 18, in Santiago de Compostela, Spain.

In recent years, corpora of multilingual translated texts have become widely available for a number of languages. Notwithstanding the seminal paper by Nagao (1984), it is primarily since the early 90's that such bilingual texts have been exploited in the area of Machine Translation (MT).

The two main paradigmatic approaches which have been proposed are Statistics-based Machine Translation (SBMT) and Example-Based Machine Translation (EBMT). A related variant of EBMT that we ignore here, despite being widely used in the localisation area, is that of Translation Memories (TM).

While translation memory systems are used in restricted domains, SBMT systems require training on huge, good quality bilingual corpora. As a consequence TMs can hardly be applied as a general purpose solution to MT, and SBMT as yet cannot produce complex translations to the desired quality, even if such translations are given to the system in the training phase. EBMT seeks to exploit and integrate a number of knowledge resources, such as linguistics and statistics, and symbolic and numerical techniques, for integration into one framework. In this way, rule-based morphological, syntactic and/or semantic information is combined with knowledge extracted from bilingual texts which is then re-used in the translation process.

However, it is unclear how one might combine the different knowledge resources and techniques in an optimal way. In EBMT, therefore, the question is asked: what can be learned from a bilingual corpus and what needs to be manually provided? Furthermore, we remain uncertain as to how far the EBMT methodology can be pushed with respect to translation quality and/or translation purpose. Finally, one wonders what the implications and consequences are for the size and quality of the reference translations, (computational) complexity of the system, sizeability and transportability, if such an approach is taken. Given this background, the contributions of the workshop seek to shed some light on these open questions, among others.

Instead of inviting a speaker, the organizers have decided to feature two outstanding papers submitted to the workshop which are allocated a longer presentation and discussion time. We are glad that Harold Somers and Kevin McTait have accepted this invitation.

We are very grateful to the programme committee for the effort they put in reviewing the papers and their comments and suggestions for the workshop. Additionally we would like to thank Ute Hauck for converting file formats and compiling these proceedings into printable form.

Michael Carl and Andy Way

July 2001