Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation
Conceptual transfer: Using local classifiers for transfer selection
Gregor Thurmair
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
A key challenge for Machine Translation is transfer selection, i.e. to find the right translation for a given word from a set of alternatives (1:n). This problem becomes the more important the larger the dictionary is, as the number of alternatives increases. The contribution presents a novel approach for transfer selection, called conceptual transfer, where selection is done using classifiers based on the conceptual context of a translation candidate on the source language side. Such classifiers are built automatically by parallel corpus analysis: Creating subcorpora for each translation of a 1:n package, and identifying correlating concepts in these subcorpora as features of the classifier. The resulting resource can easily be linked to transfer components of MT systems as it does not depend on internal analysis structures. Tests show that conceptual transfer outperforms the selection techniques currently used in operational MT systems.
A modular open-source focused crawler for mining monolingual and bilingual corpora from the web
Vassilis Papavassiliou
Prokopis Prokopidis
Gregor Thurmair
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora
Efficiency-based evaluation of aligners for industrial applications
Antonio. Toral
Marc Poch
Pavel Pecina
Gregor Thurmair
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
EASTIN-CL: A multilingual front-end to a database of Assistive Technology products
Gregor Thurmair
Andrea Agnoletto
Valerio Gower
Roberts Rozis
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
Creating Term and Lexicon Entries from Phrase Tables
Gregor Thurmair
Vera Aleksić
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
Large Scale Lexical Analysis
Gregor Thurmair
Vera Aleksić
Christoph Schwarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.
Personal Translator at WMT2011
Vera Aleksić
Gregor Thurmair
Proceedings of the Sixth Workshop on Statistical Machine Translation
Comparing different architectures of hybrid Machine Translation systems
Gregor Thurmair
Proceedings of Machine Translation Summit XII: Posters
Generation issues in machine translation
Gregor Thurmair
Proceedings of the Workshop on Using corpora for natural language generation
Proceedings of the Workshop on Automatic procedures in MT evaluation
Gregor Thurmair
Khalid Choukri
Bente Maegaard
Proceedings of the Workshop on Automatic procedures in MT evaluation
Automatic evaluation in MT system production
Gregor Thurmair
Proceedings of the Workshop on Automatic procedures in MT evaluation
Improving Machine Translation Quality
Gregor Thurmair
Proceedings of Machine Translation Summit X: Invited papers
This paper reports on measures to improve the quality of MT systems, by using a hybrid system architecture which adds corpus-based and statistical components to an existing rule-based system backbone. The focus is on improving the accuracy of the dictionary resources.
Multilingual Content Processing
Gregor Thurmair
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Making term extraction tools usable
Gregor Thurmair
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT
The Comprendium Translator system
Juan A. Alonso
Gregor Thurmair
Proceedings of Machine Translation Summit IX: System Presentations
From Resources to Applications. Designing the Multilingual ISLE Lexical Entry
Sue Atkins
Nuria Bel
Francesca Bertagna
Pierrette Bouillon
Nicoletta Calzolari
Christiane Fellbaum
Ralph Grishman
Alessandro Lenci
Catherine MacLeod
Martha Palmer
Gregor Thurmair
Marta Villegas
Antonio Zampolli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
The ISLE in the ocean. Transatlantic standards for multilingual lexicons (with an eye to machine translation)
Nicoletta Calzolari
Alessandro Lenci
Antonio Zampolli
Nuria Bel
Marta Villegas
Gregor Thurmair
Proceedings of Machine Translation Summit VIII
The ISLE project is a continuation of the long standing EAGLES initiative, carried out under the Human Language Technology (HLT) programme in collaboration between American and European groups in the framework of the EU-US International Research Co-operation, supported by NSF and EC. In this paper we concentrate on the current position of the ISLE Computational Lexicon Working Group (CLWG), whose activities aim at defining a general schema for a multilingual lexical entry (MILE), as the basis for a standard framework for multilingual computational lexicons. The needs and features of existing Machine Translation systems provide the main reference points for the process of consensual definition of the MILE. The overall structure of the MILE will be illustrated with particular attention to some of the issues raised for multilingual lexicons by the need of expressing complex transfer conditions among translation equivalents
The Open Lexicon Interchange Format (OLIF) comes of age
Christian Lieske
Susan McCormick
Gregor Thurmair
Proceedings of Machine Translation Summit VIII
This paper summarizes the current status of version 2 of the Open Lexicon Interchange Format (OLIF). As a natural extension of the OLIF prototype (OLIF version 1), version 2 has been modified with respect to content and formalization (e.g., it is now XML-compliant). These enhancements now make it possible to use OLIF in a variety of Natural Language Processing applications and general language technology environments (e.g., terminology management systems). At the time of writing, several industrial partners of the OLIF Consortium had already started work on implementing OLIF support. Details on OLIF can be found on www.olif.net.
TQPro: Quality Tools for the Translation Process
Gregor Thurmair
Proceedings of Translating and the Computer 22
The L&H approach to development of tools for new languages
Gregor Thurmair
Johannes Ritzke
EAMT Workshop: EU and the new languages
Exchange Interfaces for Translation Tools
Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers
The following paper presents an overview of current discussions of exchange interfaces in the area of multilingual processing. It first discusses the principles which are relevant for the definition of such interfaces; it then presents a state of the art and a proposal in the area of text interfaces, translation memory interfaces, and terminology exchange. The approach is bottom-up, i.e. it starts from existing interfaces and existing requirements, and intends to be of practical use. It reflects the discussions in current multilingual research projects of the EC, like OTELO and AVENTINUS.
From METAL to T1: Systems and Components for Machine Translation Applications
Ulrike Schwall
Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers
This paper describes the progress which has been made to make MT systems usable in professional environments. After many years of significant investment, it was decided that the time was ripe for the METAL machine translation system to be better positioned in the market place. Two lines of action were followed: Introducing the system onto the PC market, using the GMS-T1 as a concrete example; Reusing system components in customized solutions, using the AVENTINUS project as an example, which is a multilingual information processing application. Both lines of action have far-reaching consequences for system development. But they also create new opportunities to improve the system's capabilities and flexibility.
Multilingual information processing
Gregor Thurmair
Proceedings of Machine Translation Summit V
An Architecture Sketch of Eurotra-II
Jörg Schütz
Gregor Thurmair
Roberto Cencioni
Proceedings of Machine Translation Summit III: Papers
This paper outlines a new architecture for a NLP/MT development environment for the EUROTRA project, which will be fully operational in the 1993-94 time frame. The proposed architecture provides a powerful and flexible platform for extensions and enhancements to the existing EUROTRA translation philosophy and the linguistic work done so far, thus allow- ing the reusability of existing grammatical and lexical resources, while ensuring the suitability of EUROTRA methods and tools for other NLP/MT system developers and researchers.
Parsing for Grammar and Style Checking
Gregor Thurmair
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics