Gregor Thurmair

2015

Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib

Evaluation of the domain adaptation of MT systems in ACCURAT
Gregor Thurmair
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib abs

Conceptual transfer: Using local classifiers for transfer selection
Gregor Thurmair
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

A key challenge for Machine Translation is transfer selection, i.e. to find the right translation for a given word from a set of alternatives (1:n). This problem becomes the more important the larger the dictionary is, as the number of alternatives increases. The contribution presents a novel approach for transfer selection, called conceptual transfer, where selection is done using classifiers based on the conceptual context of a translation candidate on the source language side. Such classifiers are built automatically by parallel corpus analysis: Creating subcorpora for each translation of a 1:n package, and identifying correlating concepts in these subcorpora as features of the classifier. The resulting resource can easily be linked to transfer components of MT systems as it does not depend on internal analysis structures. Tests show that conceptual transfer outperforms the selection techniques currently used in operational MT systems.

2013

pdf bib

A modular open-source focused crawler for mining monolingual and bilingual corpora from the web
Vassilis Papavassiliou | Prokopis Prokopidis | Gregor Thurmair
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

2012

pdf bib

Creating Term and Lexicon Entries from Phrase Tables
Gregor Thurmair | Vera Aleksić
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib abs

Large Scale Lexical Analysis
Gregor Thurmair | Vera Aleksić | Christoph Schwarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.

pdf bib

Efficiency-based evaluation of aligners for industrial applications
Antonio. Toral | Marc Poch | Pavel Pecina | Gregor Thurmair
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib

EASTIN-CL: A multilingual front-end to a database of Assistive Technology products
Gregor Thurmair | Andrea Agnoletto | Valerio Gower | Roberts Rozis
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

2011

pdf bib

Personal Translator at WMT2011
Vera Aleksić | Gregor Thurmair
Proceedings of the Sixth Workshop on Statistical Machine Translation

2009

pdf bib

Comparing different architectures of hybrid Machine Translation systems
Gregor Thurmair
Proceedings of Machine Translation Summit XII: Posters

2007

pdf bib

Proceedings of the Workshop on Automatic procedures in MT evaluation
Gregor Thurmair | Khalid Choukri | Bente Maegaard
Proceedings of the Workshop on Automatic procedures in MT evaluation

bib

Automatic evaluation in MT system production
Gregor Thurmair
Proceedings of the Workshop on Automatic procedures in MT evaluation

pdf bib

Generation issues in machine translation
Gregor Thurmair
Proceedings of the Workshop on Using corpora for natural language generation

2005

pdf bib abs

Improving Machine Translation Quality
Gregor Thurmair
Proceedings of Machine Translation Summit X: Invited papers

This paper reports on measures to improve the quality of MT systems, by using a hybrid system architecture which adds corpus-based and statistical components to an existing rule-based system backbone. The focus is on improving the accuracy of the dictionary resources.

This paper summarizes the current status of version 2 of the Open Lexicon Interchange Format (OLIF). As a natural extension of the OLIF prototype (OLIF version 1), version 2 has been modified with respect to content and formalization (e.g., it is now XML-compliant). These enhancements now make it possible to use OLIF in a variety of Natural Language Processing applications and general language technology environments (e.g., terminology management systems). At the time of writing, several industrial partners of the OLIF Consortium had already started work on implementing OLIF support. Details on OLIF can be found on www.olif.net.

pdf bib abs

The ISLE in the ocean. Transatlantic standards for multilingual lexicons (with an eye to machine translation)
Nicoletta Calzolari | Alessandro Lenci | Antonio Zampolli | Nuria Bel | Marta Villegas | Gregor Thurmair
Proceedings of Machine Translation Summit VIII

The ISLE project is a continuation of the long standing EAGLES initiative, carried out under the Human Language Technology (HLT) programme in collaboration between American and European groups in the framework of the EU-US International Research Co-operation, supported by NSF and EC. In this paper we concentrate on the current position of the ISLE Computational Lexicon Working Group (CLWG), whose activities aim at defining a general schema for a multilingual lexical entry (MILE), as the basis for a standard framework for multilingual computational lexicons. The needs and features of existing Machine Translation systems provide the main reference points for the process of consensual definition of the MILE. The overall structure of the MILE will be illustrated with particular attention to some of the issues raised for multilingual lexicons by the need of expressing complex transfer conditions among translation equivalents

2000

pdf bib

TQPro: Quality Tools for the Translation Process
Gregor Thurmair
Proceedings of Translating and the Computer 22

1999

bib

The L&H approach to development of tools for new languages
Gregor Thurmair | Johannes Ritzke
EAMT Workshop: EU and the new languages

1997

pdf bib abs

From METAL to T1: Systems and Components for Machine Translation Applications
Ulrike Schwall | Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers

This paper describes the progress which has been made to make MT systems usable in professional environments. After many years of significant investment, it was decided that the time was ripe for the METAL machine translation system to be better positioned in the market place. Two lines of action were followed: Introducing the system onto the PC market, using the GMS-T1 as a concrete example; Reusing system components in customized solutions, using the AVENTINUS project as an example, which is a multilingual information processing application. Both lines of action have far-reaching consequences for system development. But they also create new opportunities to improve the system's capabilities and flexibility.

pdf bib abs

Exchange Interfaces for Translation Tools
Gregor Thurmair
Proceedings of Machine Translation Summit VI: Papers

The following paper presents an overview of current discussions of exchange interfaces in the area of multilingual processing. It first discusses the principles which are relevant for the definition of such interfaces; it then presents a state of the art and a proposal in the area of text interfaces, translation memory interfaces, and terminology exchange. The approach is bottom-up, i.e. it starts from existing interfaces and existing requirements, and intends to be of practical use. It reflects the discussions in current multilingual research projects of the EC, like OTELO and AVENTINUS.

1995

pdf bib

Multilingual information processing
Gregor Thurmair
Proceedings of Machine Translation Summit V

1991

pdf bib abs

An Architecture Sketch of Eurotra-II
Jörg Schütz | Gregor Thurmair | Roberto Cencioni
Proceedings of Machine Translation Summit III: Papers

This paper outlines a new architecture for a NLP/MT development environment for the EUROTRA project, which will be fully operational in the 1993-94 time frame. The proposed architecture provides a powerful and flexible platform for extensions and enhancements to the existing EUROTRA translation philosophy and the linguistic work done so far, thus allow- ing the reusability of existing grammatical and lexical resources, while ensuring the suitability of EUROTRA methods and tools for other NLP/MT system developers and researchers.

1990

pdf bib

Parsing for Grammar and Style Checking
Gregor Thurmair
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

Gregor Thurmair

2015

2014

2013

2012

2011

2009

2007

2005

2004

2003

2002

2001

2000

1999

1997

1995

1991

1990

Co-authors

Venues