Verbmobil: foundations of speech-to-speech translation
Ed. Wolfgang Wahlster (
abstracts
Mobile Speech-To-Speech Translation of Spontaneous Dialogs: An Overview of The Final Verbmobil System
Wolfgang Wahlster
DFKI GmbH,
Abstract. Verbmobil is a
speaker-independent and bidirectional speech-to-speech translation system for
spontaneous dialogs in mobile situations. It recognizes spoken input, analyses
and translates it, and finally utters the translation. The multilingual system
handles dialogs in three business-oriented domains, with context-sensitive
translation between three languages (German, English, and Japanese). Since Verbmobil emphasizes the robust processing of spontaneous
dialogs, it poses difficult challenges to human language technology, that we
discuss in this paper. We present Verbmobil as a
hybrid system incorporating both deep and shallow processing schemes. We
describe the anatomy of Verbmobil and the
functionality of its main components. We discuss Verbmobil's
multi-blackboard architecture that is based on packed representations at all
processing stages. These packed representations together with formalisms for underspecification capture the non-determinism in each
processing phase, so that the remaining uncertainties can be reduced by
linguistic, discourse and domain constraints as soon as they become applicable.
We present Verbmobil's multi-engine approach, eg. its use of five concurrent translation engines:
statistical translation, case-based translation, substring-based translation,
dialog-act based translation, and semantic transfer. Distinguishing features
like the multilingual prosody module and the generation of dialog summaries are
highlighted. We conclude that Verbmobil has
successfully met the project goals with more than 80% of approximately correct
translations and a 90% success rate for dialog tasks.
Facts and Figures about the Verbmobil Project
Reinhard Karger and Wolfgang Wahlster
DFKI GmbH,
Abstract. In this chapter the organizational and funding structure
of the Verbmobil project
is summarized and the major technical data about the final Verbmobil
system and the Verbmobil
archives are compiled.
Multilingual Speech
Recognition
Alex Waibel,
Institut für Logik, Komplexität und Deduktionssysteme,
Abstract. The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language identification component which is able to switch between these recognizers. This article describes the challenges of multilingual speech recognition and presents different solutions to the problem of the automatic language identification task. The combination of the described components results in a flexible and user-friendly multilingual spoken dialog system.
Robust Recognition
of Spontaneous Speech
Udo Haiber1, Helmut Mangold1, Thilo Pfau2, Peter Regel-Brietzmann1, Günther Ruske2, and Volker Schleß1
1 DaimlerChrysler AG, Research and Technology,
2 Institute for Human-Machine-Communication, Technische Universität München,
Abstract. This contribution describes the challenges and the progress which have been made in Verbmobil concerning robustness of speech recognition for various types of adverse conditions, like channel distortion, environmental noise and various speaker and speaking conditions. For the channel and noise problem classical approaches like cepstral bias normalization and spectral subtraction methods have been improved as well as new methods like parallel model combination. One major result is the fact, that an intelligent combination of various methods achieves the best results. Considerable progresses have also been made in research on unsupervised speaker adaptation. Several different main approaches are presented to improve robustness against variations of speaking rate, speaking style and speaker characteristics. The methods described include new estimation of the parameters for vocal tract length normalization, features and codebook transformation methods using ML algorithms, and pronunciation adaptation of the words in the lexicon.
Fast Search for Large Vocabulary Speech
Recognition
Stephan Kanthak, Achim Sixtus, Sirko Molau, Ralf Schlüter, and Hermann Ney
Lehrstuhl für Informatik VI, Computer Science Department, RWTH
Abstract. In this
article we describe methods for improving the RWTH German speech recognizer used within the Verbmobil project. In particular, we present acceleration
methods for the search
based on both within-word and across-word phoneme models. We also study
incremental methods to reduce the response time of the online speech
recognizer. Finally, we present experimental
off-line results for the three Verbmobil scenarios.
We report on word error rates
and real-time factors for both speaker independent and speaker
dependent recognition.
Capturing
Jochen Peters and Dietrich Klakow
Philips GmbH Forschungslaboratorien,
Abstract. Written and spoken texts show long range correlations which are valuable for speech recognition systems. Unfortunately, these dependencies cannot be properly described by the widespread backing-off language models (LMs). This paper introduces basic concepts exploit long ranging correlations for the task of language modeling. Several approaches to get suitable LM structures are discussed and compared. The theoretical findings are fully con-Bflned by experiments performed on the spontaneous speech from the Verbmobil II domain and on written text from the Wallstreet Journal corpus.
Among the tested techniques to integrate the different sources of information the log-linear interpolation and the maximum entropy approach proved very effective. Perplexity reductlons—as compared to optimal backing-off LMs—of 8% are observed, and very com-pact models have been trained which still outperform the full, unpruned backing-off N-grams by 4%, at the same time reducing the LM size by 50% for trigrams and by 70% for fourgrams.
Data Driven Generation of Pronunciation
Dictionaries
Matthias Eichner, Matthias Wolff, and Rüdiger Hoffmann
Laboratory
of Acoustics and Speech Communication, Technische
Abstract. In the framework of the German Verbmobil project we developed a procedure for the automatic, data-driven generation of pronunciation dictionaries for speech recognition systems. In most recognizers only simple dictionaries containing the canonical pronuncia- tion form are used. They represent the correct pronunciation, but in most cases the canonical pronunciation does not match the actual realization of the word. To solve this problem we chose an approach to derive pronunciation variants automatically from a speech database. The training algorithm bases on a canonical dictionary which is compiled into a graph rep-presentation in a first stage. Pronunciation variants are then learned from a training sample consisting of speech signal and its orthographic transcription. In this paper we will focus on the experimental results obtained in the Verbmobil framework and introduce methods to evaluate pronunciation dictionaries generated by the training procedure.
The Prosody Module
Anton Batliner,
Jan Buckow, Heinrich Niemann,
Elmar Nöth, and Volker Warnke
Lehrstuhl für
Mustererkennung,
Abstract. We describe the acoustic-prosodic and syntactic-prosodic annotation and classification of boundaries, accents and sentence mood integrated in the Verbmobil system for the three languages German, English, and Japanese. For the acoustic-prosodic classification, a large feature vector with normalized prosodic features is used. For the three languages, a multilingual prosody module was developed that reduces memory requirement considerably, compared to three monolingual modules. For classification, neural networks and statistic language models are used.
The Recognition of Emotion
Anton Batliner1, Richard Huber1, Heinrich Niemann1, Elmar Nöth1, Jörg Spilker2, and Kerstin Fischer3
1 Lehrstuhl für
Mustererkenmmg,
2 Lehrstuhl für
Künstliche Intelligenz,
3 Institut für
Informatik, AB NatS,
Abstract. To detect emotional user behavior, particularly anger, can be very useful for successful automatic dialog processing. We present databases and prosodic classifiers implemented for the recognition of emotion in Verbmobil. Using a prosodic feature vector alone is, however, not sufficient for the modelling of emotional user behavior. Therefore, a module is described that combines several knowledge sources within an integrated classification of trouble in communication.
Processing Self-Corrections in a
Speech-to-Speech System
Jörg Spilker, Martin Klarner, and Günther Görz
Chair for Artificial Intelligence,
Department of Computer Science,
Abstract. Self-repairs are a frequent phenomenon in spontaneous speech. The ability to detect and correct those repairs is therefore indispensable for any spoken language system. We present a framework for detection and correction of speech repairs where all relevant levels of information, i.e., acoustics, lexis, syntax and semantics can be integrated. The basic idea is to reduce the search space for repairs as soon as possible by cascading filters that involve more and more features. At first an acoustic module generates hypotheses about the existence of a repair. Afterwards a stochastic model suggests a correction for every hypothesis. Highly scored corrections are inserted as new paths in the word lattice. Finally, a lattice parser decides whether the repair should be accepted or not.
Integrated Shallow
Linguistic Processing
Ulrich Block and Tobias Ruland
Siemens AG, Corporate
Technology,
Abstract. This article gives an overview of the Integrated Processing module that realises the multi-parsing-engine for the Verbmobil parsing modules HPSG, statistical parsing, chunk parsing. The Integrated Processing module implements an A* search on a word hypotheses graph and interface functions to the different parsing approaches.
Probabilistic LR-
Parsing with Symbolic Postprocessing
Tobias Ruland
Siemens AG, Corporate
Technology,
Abstract. This article describes a novel approach to probabilistic
LR-parsing of spontaneously spoken utterances developed in Verbmobil.
It extends the use of context knowledge within the probabilistic model of the
parser and improves its output by applying tree transformation rules learned
from corpora. The parser was developed for German, English and Japanese and
achieves more than 90% Labeled Recall/Precision on
parsed Verbmobil utterances.
Robust Chunk Parsing for Spontaneous
Speech
Erhard W. Hinrichs, Sandra Kübler, Valia Kordoni, and Frank H. Müller
Seminar für Sprachwissenschaft, Abteilung Computerlinguistik, Eberhard-Karls-Universität Tübingen,
Abstract. Chunk parsing (see Abney, 1991, and Abney, 1996) offers a particularly promising approach for robust, partial parsing with the goal of broad data coverage. A chunk parser is particularly well suited for an application for spontaneous speech since it can deal robustly with fragmentary or ill-formed input.
In order to guarantee the functionality that the Verbmobil system requires, wide-coverage finite-state grammars for the Verbmobil scenarios had to be constructed. In addition, several extensions to the basic chunk parsing technology had to be implemented in the TüSBL Tübingen Similarity Based Learning) system: the adaptation to input from the speech recognizers and to word incremental processing, and the construction of complete trees out of the chunk analysis.
TüSBL's tree
construction algorithm relies on techniques from memory-based learning that allow similarity-based classification
of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank.
Verbmobil Interface Terms (VITs)
Michael Schiehlen1, Johan Bos2, and Michael Dorna1
Institute for Natural
Language Processing (IMS),
Abstract. This article describes the concepts
and the contents of Verbmobil Interface Terms ): In VITs all linguistic
information of an utterance relevant for translation is represented. They are
used to provide an interface representation between several linguistic and
dialog components of the
Verbmobil system. Information in VITs
is encoded in a record-like data structure.
The fields are variable-free lists of non-recursive terms, so-called
"flat" representations. They are filled with semantic, scopal, sortal, morpho-syntactic, prosodic, and discourse information. A labelling system is used to relate different
kinds of information to each other. A
library package realizing an abstract data type implements construction,
access, update, check, print, etc.
facilities for VITs.
Semantic Construction
Michael Schiehlen
IMS,
Abstract. This article describes the concepts and the implementation of the semantic construction module (SemCon) used in the Verbmobil system. SemCon maps trees to Verbmobil Interface Terms (VITs). A main focus lies on robustness and underspecification. A minimalistic syntax-semantics interface is defined to support modularity. Diverse repair strategies are discussed to enhance robustness. With SemCon, it is possible to process large amounts of data and build semantic representations for a good part of the Verbmobil corpus.
Deep Linguistic
Analysis with HPSG
Hans Uszkoreit1, Dan Flickinger2, Walter Kasper1, and Ivan A. Sag2
1 DFKI GmbH,
2 Center for the Study of Language and Information
(CSLI),
Abstract. Deep linguistic analysis is based on Head-Driven Phrase Structure Grammar (HPSG) which provides an integrated approach to syntactic and semantic analysis. We present the basic concepts and ideas of HPSG, as well as of the underlying semantic representation formalism and its interface to the Verbmobil system.
HPSG Analysis of
German
Stefan Müller
and Walter Kasper
DFK1 GmbH,
Abstract. We present an overview of the HPSG grammar for the German deep analysis in Verbmobil. Especially, issues of using it for spontaneous speech processing in specific application domains will be discussed. Also, extra-linguistic information such as prosody has to be taken into account which is absent in written language. Finally, we present an empirical evaluation of the grammar with respect to the Verbmobil corpora.
HPSG Analysis of
English
Dan Flickinger, Ann Copestake, and Ivan A. Sag
Center for the Study of Language and Information (CSLI),
Abstract. In this chapter we summarize the
results of the HPSG English grammar project for analysis and generation in Verbmobil,
housed at CSLI,
HPSG Analysis of
Japanese
Melanie Siegel
Universität des Saarlandes,
Abstract. A Japanese HPSG for deep analysis and generation in the Verbmobil system was developed. The focus point of the grammar is the processing of spontaneous Japanese dialogs. Therefore, the description of phenomena of spoken Japanese is central. We present some empirical evaluation of the grammar with Verbmobil corpora.
Efficient and Robust Parsing of Word Hypotheses Graphs
Bernd Kiefer, Hans-Ulrich Krieger,
and Mark-Jan Nederhof
DFKI GmbH,
Abstract. This paper describes the successful metamorphosis of Page from a string-based grammar development system to an efficient run time system, operating on word hypotheses graphs (WHGs). In particular, we report on the techniques we have applied to Page and which have resulted into a speed-up in parsing time of more than an order of magnitude. We elaborate how the system is interfaced to other components: WHG search, prosody detector, and robust semantic processing. We also present measurements for string and WHG parsing. The system as described in the paper has been applied in the speech translation project Verbmobil with large HPSG grammars for English, German, and Japanese.
Speech Lexica and Consistent
Multilingual Vocabularies
Dafydd Gibbon and Harald Lüngen
Abstract. This contribution describes the theoretical foundations and lexical engineering procedures used in developing a common, consistent, linguistically and formally well-defined lexical database for all components of the Verbmobil speech-to-speech translation system.
Combining Analyses
from Various Parsers
•Rupp1, Jörg Spilker2, Martin Klarner2, and Karsten L. Worm1
Department
of Computational Linguistics, Universität des Saarlandes, Germany Computer
Science Institute,
Abstract. This chapter describes measures implemented in the semantics module to ensure that best use is made of the available linguistic analyses.
Robust Semantic
Processing of Spoken Language
Manfred Pinkal, C.J. Rupp, and Karsten Worm
Department of Computational Linguistics, Universität des
Abstract. This chapter describes a novel strategy for the robust processing of spoken inputs in dialog translation systems. The implemented processor forms a major subcomponent of the semantics module in the Verbmobil system.
Discourse and Dialog
Semantics for Translation
Johan Bos and Julia Heine
Department of Computational Linguistics, Universität des
Abstract. The Discourse and Dialog component in Verbmobil resolves non-local ambiguities, using knowledge provided by prosody and dialog acts, and the history of the ongoing dialog. It is a rule-based system, working on an ordered set of about 600 rules, dealing with phenomena found in English, German, and Japanese. The phenomena covered for English and German are lexical ambiguities, sentence mood determination, focus projection, and anaphora and ellipsis resolution. The disambiguation rules for Japanese include definiteness resolution, topic instantiation, and zero-anaphora resolution.
Multilingual
Semantic Databases
Walter Kasper
DFKI GmbH,
Abstract. To define the possible content of semantic representations for each language mantic databases were defined which provide the same types of information in a uniform way. A prerequisite is that the information types are meaningful across the different languages involved, thus a multilingual description system. We describe the use and structure of the databases. These provide not only interface specifications among the deep processing components in the system, but also provide a rich resource for describing semantic properties of lexical items in theory and implementation independent way.
Semantic-Based
Transfer
Win C. Emele, Michael Dorna, Anke Lüdeling, and Heike Zinsmeister, and Christian Rohrer
Abstract. This article presents the concepts and
the implementation of the semantic-based transfer approach used in the transfer component of the machine
translation system Verbmobil. The transfer component
acts as a rewriting system on enriched semantic representations. We show how
the transfer formalism handles translation mismatches, structural divergences
and other translation problems.
If necessary, ambiguities are resolved by using the local input information or
by inference results provided by other Verbmobil
components. A system of macros and
templates facilitates rule development. The transfer component consists of
different cascaded sub-modules.
The application of rules within the sub-modules is ordered automatically by
specificity. The efficiency of the transfer component is illustrated by
performance data.
Statistical Methods
for Machine Translation
Stephan Vogel, Franz Josef Och, Christof Tillmann, Sonja Nießen, Hassan Sawaf, and Hermann Ney
Lehrstuhl für Informatik VI, Computer
Science Department, RWTH
Abstract. In this article we describe the statistical approach to machine translation as implemented in the stattrans module of the Verbmobil system. The statistical translation approach uses two types of information: a translation model and a language model. The language model used is an m-gram model. The translation model comprises a stochastic lexicon and word position parameters. To capture dependencies between word groups in each of the two languages, alignment templates are used. We describe the components of the system and report results on the Verbmobil task. The experience obtained in the Verbmobil project shows that the statistical approach is very competitive with other translation approaches.
Adapting a Large
Hans Ulrich Block, Stefanie Schachtl, and Manfred Gehrke
Siemens AG, Corporate Technology,
Abstract. This paper describes an attempt to transform a general purpose machine translation system that had originally been designed for human aided computer translation of technical documentation into the linguistic component of a domain dependent spoken language translation system for remote PC maintenance. In the first part, the translation system is described. The second part describes the measures taken to adapt it to the spoken language task.
Example-Based Incremental
Synchronous Interpretation
Hans Ulrich Block
Siemens AG, Corporate
Technology,
Abstract. This article describes a new approach to example based incremental translation for automatic interpretation systems developed in Verbmobil. The translation module is completely learned from a bilingual corpus. The training phase combines statistical word alignment with precomputation of translation "chunks" and contextual clustering of syntactic equivalence classes (word classes). The system gives incremental output for every piece input being it words or sequences of words. It thus tries to mimic the behaviour of a human synchronous interpreter. If a larger context leads to the need for reformulation the system utters a correction marker like I mean, and restarts the output from the starting position of the reformulation. The system is currently effective for German Û English. German Û Chinese and German Û Japanese are under construction. In the Verbmobil evaluation, this approach reached 79% of approximately correct translations on speech recognition output.
Example-Based
Machine Translation with Templates
Marko Auerswald
DFKI GmbH,
Abstract. This paper presents an approach for
template based machine translation, in which the templates are generated in a highly
automated way from large corpora of translation examples. The techniques described have
been successfully used in one of the alternative translation modules within the Verbmobil speech-to-speech translation system. A crucial
feature of this approach is the capability of processing word lattice input in
an efficient way.
Robust Content Extraction for Translation and Dialog Processing
Norbert Reithinger
and Ralf Engel
DFKI GmbH,
Abstract. The design rationale guiding the development
of the reductionist dialog act based translation module in Verbmobil
was robustness. Even in case the speech recognition or the prosodic processing does not perform
perfectly, this module extracts and translates the main intentions and facts related to the
domain. In a three step approach, first the dialog act describing the intention is computed using a
statistical approach. The second step is the construction of the propositional
content with robust hierarchical finite state transducers. For the definition of the transducers, knowledge sources
available in Verbmobil are exploited. The resulting
rep-resentation of these two steps is used in a
template based finite state generator to realize the target language
expressions. The internal representation is also communicated to the dialog module where it plays an important part in maintaining the
dialog state.
Modeling Negotiation Dialogs
Jan Alexandersson1, Ralf Engel1, Michael Kipp1, Stephan Koch2, Uwe Küssner2, Robert Reithinger1, and Manfred Stede2
1 DFKI GmbH,
2 Technische Universität
Abstract. For various purposes in the Verbmobil system it is necessary to build a full model of an unfolding dialog, on a suitably abstract level of representation. The basis of this model are representations of the individual utterances, and we capture their content by a combination dialog act and propositional content. Our hierarchy of dialog acts was used to annotate 21 CD-ROMs from the Verbmobil corpus, and the experience gained with the framework influenced standardization efforts in the international scientific community. On the side of propositional content, particular attention was given to the representation of temporal expressions, due to the application domains of Verbmobil.
Dialog Processing
Michael Kipp,
Jan Alexandersson, Ralf Engel, and Norbert Reithinger
DFKI GmbH,
Abstract. This chapter explains the major functionality of the dialog module in Verbmobil. Dialog knowledge is needed for context sensitive speech translation as well as for the automatic generation of dialog result summaries. Our component produces necessary structures for both purposes and stores them in a centrally accessible data repository—the dialog memory. The structures are based on robustly extracted shallow data which are corrected, extended and structured by our dialog processor. We use time and object completion algorithms to collect context data and compute inter-object relations to infer relevance for summarization. The resulting structures are used by the document generator for dialog minutes and summaries, and by the context evaluation module for translation disambiguation.
Contextual
Disambiguation
Stephan Koch, Uwe Küssner, and Manfred Stede
Technische
Abstract. Resolving ambiguities is a necessary step for machine translation aiming at high quality. In Verbmobil, with its specific conditions of speech-to-speech translation, contextual reasoning for purposes of disambiguation has to respect the particular conditions of being situated in a near-realtime system, and has to take errors in the speech recognition phase into account. This chapter describes the context evaluation module of Verbmobil's "deep processing" translation path. We characterize the linguistic phenomena that require contextual reasoning, describe the shape of our context representation, and explain how this representation is constructed during utterance interpretation, which involves performing the required disambiguations.
The Verbmobil Generation Component VM-GECO
Tilman Becker, Anne Kilger, Patrice Lopez, and Peter Poller
DFKI GmbH,
Abstract. This chapter presents the Verbmobil generation component VM-GECO. The main modules of our component—microplanner and syntactic generator—are illuminated in detail focusing on the problems of real-time computation, multilinguality, dependencies among choices and the use of different representation formalisms. We discuss robustness as an important feature of large-scale systems with spontaneous and erroneous input.
The Application of HPSG-to-TAG
Compilation Techniques
Tilman Becker and Patrice Lopez
DFKI GmbH,
Abstract. The HPSG-to-TAG compilation algorithm proposed in Kasper et al. (1995) has been the basis of large scale experiments in Verbmobil. The results presented here refer concentrate on the English HPSG grammar developed at CSLI. Several non-trivial theoretical problems have been discovered by the practical application of this algorithm. This paper presents these experiments, the main shortcomings of the initial algorithm and some of the solutions we have developed in order to use the resulting compiled LTAG (Lexicalized TAG) grammar in a real world system.
Generating Multilingual Dialog Summaries
and Minutes
Jan Alexandersson,
Peter Poller, and Michael Kipp
DFKI GmbH,
Abstract. This chapter describes the on-demand generation of dialog minutes and result summaries of dialogs. We focus on summary generation since the generation of minutes is performed using almost the same techniques. We describe how the relevant data are selected from the dialog memory, how the data are converted into sequences of VITs and, finally, WC demonstrate how the existing generation module of Verbmobil was extended to generate textual documents. Multilinguality is achieved by utilizing the transfer module.
Speech Synthesis
Using Multilevel Selection and Concatenation of Units from Large Speech
Corpora
Karlheinz Stober1,
1 Institut für Kommunikationsforschung und Phonetik,
2 Institut für Akustik und Sprachkommunikation, Technische
3
Institut für Kommunikationsakustik,
Ruhr-Universität Bochum, Germany
4 DaimlerChrysler AG, Research and
Technology,
Abstract. This paper describes the Verbmobil speech synthesis: the segmental and prosodic transcription on the symbolic level, the construction of the synthesis corpus, the algorithm for selecting synthesis units out of this corpus, and the adaptation of the resulting synthetic speech to the relevant dialog situation and individual speaker.
Verbmobil Data Collection and
Annotation
Susanne Burger1, Karl Weilhammer2, Florian Schiel3, and Hans G. Tillmann2
1 Interactive Systems
Laboratories,
2 Department of Phonetics, LMU
3 Bavarian Archive for Speech Signals, LMU
Abstract. Verbmobil data collection had to satisfy the different requirements for data quality and annotation level for each project partner. This chapter describes the different user groups, their data demands and how the data collection group solved these issues.
The Tübingen Treebanks for Spoken German, English, and Japanese
Erhard W. Hinrichs, Julia Bartels, Yasuhiro Kawata, and Valia Kordoni, and Heike Telljohann
Seminar für Sprachwissenschaft, Abt. Computerlinguistik, Eberhard-Karls
Universität Tübingen,
Abstract. The Tübingen treebanks for spoken German, English and Japanese provide linguistic annotations for the Verbmobil dialog corpus of spontaneous speech in the scenarios of appointment negotiations, travel arrangements and personal computer maintenance. The annotation schemes of the Tübingen treebanks have been developed taking into account the specific characteristics of spoken language dialogs: repetitions, hesitations, "false starts", etc.
Multilingual Verbmobil-Dialogs:
Experiments, Data Collection and Data Analysis
Susanne J. Jekat and Walther v. Hahn
Computer Science
Department, Natural Language Systems Division and SFB 538 Multilin-gualism,
Abstract. In this article we describe the collection and analysis of multilingual dialogs with a human or machine interpreter within the Verbmobil framework. As the dialogs represent very rare speech data with high acoustic quality, analysis is still in progress and further research is ongoing.
Speech Recognition
Performance Assessment
Michael Malenke, Marcus Bäumler, and Erwin Paulus
Institute for
Communications Technology, Technische Universität Braunschweig,
Abstract. From 1998 to 2000 the performance of the three speech recognition modules of the Verbmobil system has been evaluated at regular intervals. The principal concepts and main results of the evaluations are presented with some stress put on the final evaluation in 2000.
Speech Synthesis
Quality Assessment
Jochen Steffens and Erwin Paulus
Institute for
Communications Technology,
Abstract. Category rating tests have been performed in order to compare the Verbmobil speech synthesis module to several commonly available speech synthesis techniques as well as to natural speech. The Verbmobil speech synthesis module applies a corpus-based selection and concatenation technique, and as regards the quality of synthesized utterances in German, appears to be superior to other synthesis techniques. For American English it is also among the best, but is not yet as dominant as it is for German. This seems to be due to the fact that there has been considerable efforts in tuning the German part of the corpus to the Verbmobil domain, while the American English part of the corpus at the time of the evaluation had not yet reached a comparably mature state.
From Off-line
Evaluation to On-line Selection
Damir Ćavar,
Uwe Küssner, and Dan Tidhar
Technische
Abstract. In order to meet the challenges set by the innovative multi-engine translation architecture, an additional selection component is necessary. The selection component fulfills the task of integrating the various alternative translations that are produced for each input utterance, and comes up with exactly one optimal translation. In the center of this chapter is a learning method that was tailored to overcome the problem of incomparable confidence values delivered by the competing translation paths, thus enabling the selection component to rely on confidence values as the main selection criterion. By using off-line human feedback and applying a linear optimization heuristic, we determine a rescaling scheme that enables us to compare confidence values across modules. We also describe some additional information sources that further elaborate the selection procedure, and finally, outline some Quality of Service parameters that are supported by the selection module.
Functional Validation
of a Machine Interpretation System: Verbmobil
Lorenzo Tessiore and Walther v. Hahn
Department of Computer Science,
Abstract. Evaluation of NLP systems is on its way to a deep and detailed standardization. Methods and techniques are developed, but only for the evaluation of ready-to-sell products; the evaluation of a system that is still under development is not standardized and not even ad hoc tools are available for this purpose. The evaluation of Verbmobil required the development of an adequate evaluation technique and a tool that could deal both with the need to validate the system as a quasi-product and to produce useful feedback to the developers for further improvement of the system. This paper explains the methodological and technical choices that led to the implementation of a graphic evaluation tool (GET), discusses the GET and shows the results that have been gathered by its use. The paper includes a discussion of the complex problem of evaluating translations.
Verbmobil From
a Software Engineering Point of View: System Design and Software Integration
Andreas Klüter, Alassane Ndiaye, and Heinz Kirchmann
DFKI GmbH,
Abstract. The distributed research and software development in Verbmobil resulted in an integrated speech-to-speech translation system. The size of the project, the heterogeneous environment at the various development sites and the constraint of software reuse required professional software engineering for successful integration. For this purpose, a software design and integration group was established. This article describes the software engineering strategies applied within Verbmobil. We discuss the prerequisites necessary for successful integration, describe the software framework provided by the system group, show how modules communicate and how integrations were performed. We also discuss design decisions and show that the concepts and the integration framework are not limited to speech-to-speech translation systems, but are also applicable to any large scale distributed software development project.
From a Stationary
Prototype to Telephone Translation Services
Heinz Kirchmann,
Alassane Ndiaye, and
Andreas Klüter
DFKI GmbH,
Abstract. In addition to the face-to-face system, Verbmobil has been extended to offer translation services via telephone. The implementation of the telephone system required some prerequisites which influence the whole system design. Modeling the user guidance had to take into account the lack of visual feedback. This article describes the general differences between the stationary and the telephone scenario and how Verbmobil was adapted to the challenges of a telephone translation server. We also show configurations and possible applications of the speech-to-speech translation telephone server.