Aljoscha Burchardt


2020

pdf bib
Fine-grained linguistic evaluation for state-of-the-art Machine Translation
Eleftherios Avramidis | Vivien Macketanz | Ursula Strohriegel | Aljoscha Burchardt | Sebastian Möller
Proceedings of the Fifth Conference on Machine Translation

This paper describes a test suite submission providing detailed statistics of linguistic performance for the state-of-the-art German-English systems of the Fifth Conference of Machine Translation (WMT20). The analysis covers 107 phenomena organized in 14 categories based on about 5,500 test items, including a manual annotation effort of 45 person hours. Two systems (Tohoku and Huoshan) appear to have significantly better test suite accuracy than the others, although the best system of WMT20 is not significantly better than the one from WMT19 in a macro-average. Additionally, we identify some linguistic phenomena where all systems suffer (such as idioms, resultative predicates and pluperfect), but we are also able to identify particular weaknesses for individual systems (such as quotation marks, lexical ambiguity and sluicing). Most of the systems of WMT19 which submitted new versions this year show improvements.

2018

pdf bib
Fine-grained evaluation of German-English Machine Translation based on a Test Suite
Vivien Macketanz | Eleftherios Avramidis | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

pdf bib
TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality
Vivien Macketanz | Renlong Ai | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

2016

pdf bib
DFKI’s system for WMT16 IT-domain task, including analysis of systematic errors
Eleftherios Avramidis | Aljoscha Burchardt | Vivien Macketanz | Ankit Srivastava
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Deeper Machine Translation and Evaluation for German
Eleftherios Avramidis | Vivien Macketanz | Aljoscha Burchardt | Jindrich Helcl | Hans Uszkoreit
Proceedings of the 2nd Deep Machine Translation Workshop

pdf bib
Evaluating Machine Translation in a Usage Scenario
Rosa Gaudio | Aljoscha Burchardt | António Branco
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this document we report on a user-scenario-based evaluation aiming at assessing the performance of machine translation (MT) systems in a real context of use. We describe a sequel of experiments that has been performed to estimate the usefulness of MT and to test if improvements of MT technology lead to better performance in the usage scenario. One goal is to find the best methodology for evaluating the eventual benefit of a machine translation system in an application. The evaluation is based on the QTLeap corpus, a novel multilingual language resource that was collected through a real-life support service via chat. It is composed of naturally occurring utterances produced by users while interacting with a human technician providing answers. The corpus is available in eight different languages: Basque, Bulgarian, Czech, Dutch, English, German, Portuguese and Spanish.

pdf bib
Tools and Guidelines for Principled Machine Translation Development
Nora Aranberri | Eleftherios Avramidis | Aljoscha Burchardt | Ondřej Klejch | Martin Popel | Maja Popović
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT development. Our toolkit for automatic evaluation showcases quick and detailed comparison of MT system variants through automatic metrics and n-gram feedback, along with manual evaluation via edit-distance, error annotation and task-based feedback.

2015

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popovic | Mihael Arcan | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
DFKI’s experimental hybrid MT system for WMT 2015
Eleftherios Avramidis | Maja Popović | Aljoscha Burchardt
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popović | Mihael Arčan | Eleftherios Avramidis | Aljoscha Burchardt | Arle Lommel
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Towards Deeper MT - A Hybrid System for German
Eleftherios Avramidis | Aljoscha Burchardt | Maja Popović | Hans Uszkoreit
Proceedings of the 1st Deep Machine Translation Workshop

pdf bib
Evaluating a Machine Translation System in a Technical Support Scenario
Rosa Del Gaudio | Aljoscha Burchardt | Arle Lommel
Proceedings of the 1st Deep Machine Translation Workshop

2014

pdf bib
The tara corpus of human-annotated machine translations
Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Maja Popović | Cindy Tscherwinka | David Vilar | Hans Uszkoreit
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

pdf bib
Using a new analytic measure for the annotation and analysis of MT errors on real data
Arle Lommel | Aljoscha Burchardt | Maja Popović | Kim Harris | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Relations between different types of post-editing operations, cognitive effort and temporal effort
Maja Popović | Arle Lommel | Aljoscha Burchardt | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2013

pdf bib
A CCG-based Quality Estimation Metric for Statistical Machine Translation Learning from Human Judgments of Machine Translation Output
Maja Popovic | Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Sven Schmeier | Cindy Tscherwinka | David Vilar
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
Learning from Human Judgments of Machine Translation Output
Maja Popovic | Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Sven Schmeier | Cindy Tscherwinka | David Vilar
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
MATECAT: Machine Translation Enhanced Computer Assisted Translation META - Multilingual Europe Technology Alliance
Georg Rehm | Aljoscha Burchardt | Felix Sasaki
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
META - Multilingual Europe Technology Alliance
Georg Rehm | Aljoscha Burchardt | Felix Sasaki
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
QTLaunchpad
Stephen Doherty | Declan Groves | Josef van Genabith | Arle Lommel | Aljoscha Burchardt | Hans Uszkoreit | Lucia Specia | Stelios Piperidis
Proceedings of Machine Translation Summit XIV: European projects

pdf bib
What can we learn about the selection mechanism for post-editing?
Maja Popović | Eleftherios Avramidis | Aljoscha Burchardt | David Vilar | Hans Uszkoreit
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

pdf bib
Multidimensional quality metrics: a flexible system for assessing translation quality
Aljoscha Burchardt
Proceedings of Translating and the Computer 35

2012

pdf bib
Involving Language Professionals in the Evaluation of Machine Translation
Eleftherios Avramidis | Aljoscha Burchardt | Christian Federmann | Maja Popović | Cindy Tscherwinka | David Vilar
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Significant breakthroughs in machine translation only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXÜ project paves the way for wide usage of hybrid machine translation outputs through various feedback loops in system development. In a consortium of research and industry partners, the project integrates human translators into the development process for rating and post-editing of machine translation outputs thus collecting feedback for possible improvements.

pdf bib
Towards the Integration of MT into a LSP Translation Workflow
David Vilar | Michael Schneider | Aljoscha Burchardt | Thomas Wedde
Proceedings of the 16th Annual conference of the European Association for Machine Translation

2011

pdf bib
From Human to Automatic Error Classification for Machine Translation Output
Maja Popović | Aljoscha Burchardt
Proceedings of the 15th Annual conference of the European Association for Machine Translation

pdf bib
Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features
Eleftherios Avramidis | Maja Popovic | David Vilar | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Evaluation without references: IBM1 scores as evaluation metrics
Maja Popović | David Vilar | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

2008

pdf bib
FATE: a FrameNet-Annotated Corpus for Textual Entailment
Aljoscha Burchardt | Marco Pennacchiotti
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Several studies indicate that the level of predicate-argument structure is relevant for modeling prevalent phenomena in current textual entailment corpora. Although large resources like FrameNet have recently become available, attempts to integrate this type of information into a system for textual entailment did not confirm the expected gain in performance. The reasons for this are not fully obvious; candidates include FrameNet’s restricted coverage, limitations of semantic parsers, or insufficient modeling of FrameNet information. To enable further insight on this issue, in this paper we present FATE (FrameNet-Annotated Textual Entailment), a manually crafted, fully reliable frame-annotated RTE corpus. The annotation has been carried out over the 800 pairs of the RTE-2 test set. This dataset offers a safe basis for RTE systems to experiment, and enables researchers to develop clearer ideas on how to effectively integrate frame knowledge in semantic inferenence tasks like recognizing textual entailment. We describe and present statistics over the adopted annotation, which introduces a new schema based on full-text annotation of so called relevant frame evoking elements.

pdf bib
Formalising Multi-layer Corpora in OWL DL - Lexicon Modelling, Querying and Consistency Control
Aljoscha Burchardt | Sebastian Padó | Dennis Spohr | Anette Frank | Ulrich Heid
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2007

pdf bib
A Semantic Approach To Textual Entailment: System Evaluation and Task Analysis
Aljoscha Burchardt | Nils Reiter | Stefan Thater | Anette Frank
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
The SALSA Corpus: a German Corpus Resource for Lexical Semantics
Aljoscha Burchardt | Katrin Erk | Anette Frank | Andrea Kowalski | Sebastian Padó | Manfred Pinkal
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the SALSA corpus, a large German corpus manually annotated with manual role-semantic annotation, based on the syntactically annotated TIGER newspaper corpus. The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the annotation framework (frame semantics) and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications.

pdf bib
SALTO - A Versatile Multi-Level Annotation Tool
Aljoscha Burchardt | Katrin Erk | Anette Frank | Andrea Kowalski | Sebastian Pado
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we describe the SALTO tool. It was originally developed for the annotation of semantic roles in the frame semantics paradigm, but can be used for graphical annotation of treebanks with general relational information in a simple drag-and-drop fashion. The tool additionally supports corpus management and quality control.
Search
Co-authors
Venues