Eleftherios Avramidis


2021

pdf bib
Automatic generation of a 3D sign language avatar on AR glasses given 2D videos of human signers
Lan Thao Nguyen | Florian Schicktanz | Aeneas Stankowski | Eleftherios Avramidis
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

In this paper we present a prototypical implementation of a pipeline that allows the automatic generation of a German Sign Language avatar from 2D video material. The presentation is accompanied by the source code. We record human pose movements during signing with computer vision models. The joint coordinates of hands and arms are imported as landmarks to control the skeleton of our avatar. From the anatomically independent landmarks, we create another skeleton based on the avatar’s skeletal bone architecture to calculate the bone rotation data. This data is then used to control our human 3D avatar. The avatar is displayed on AR glasses and can be placed virtually in the room, in a way that it can be perceived simultaneously to the verbal speaker. In further work it is aimed to be enhanced with speech recognition and machine translation methods for serving as a sign language interpreter. The prototype has been shown to people of the deaf and hard-of-hearing community for assessing its comprehensibility. Problems emerged with the transferred hand rotations, hand gestures were hard to recognize on the avatar due to deformations like twisted finger meshes.

pdf bib
Observing the Learning Curve of NMT Systems With Regard to Linguistic Phenomena
Patrick Stadler | Vivien Macketanz | Eleftherios Avramidis
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

In this paper we present our observations and evaluations by observing the linguistic performance of the system on several steps on the training process of various English-to-German Neural Machine Translation models. The linguistic performance is measured through a semi-automatic process using a test suite. Among several linguistic observations, we find that the translation quality of some linguistic categories decreased within the recorded iterations. Additionally, we notice some drops of the translation quality of certain categories when using a larger corpus.

2020

pdf bib
Fine-grained linguistic evaluation for state-of-the-art Machine Translation
Eleftherios Avramidis | Vivien Macketanz | Ursula Strohriegel | Aljoscha Burchardt | Sebastian Möller
Proceedings of the Fifth Conference on Machine Translation

This paper describes a test suite submission providing detailed statistics of linguistic performance for the state-of-the-art German-English systems of the Fifth Conference of Machine Translation (WMT20). The analysis covers 107 phenomena organized in 14 categories based on about 5,500 test items, including a manual annotation effort of 45 person hours. Two systems (Tohoku and Huoshan) appear to have significantly better test suite accuracy than the others, although the best system of WMT20 is not significantly better than the one from WMT19 in a macro-average. Additionally, we identify some linguistic phenomena where all systems suffer (such as idioms, resultative predicates and pluperfect), but we are also able to identify particular weaknesses for individual systems (such as quotation marks, lexical ambiguity and sluicing). Most of the systems of WMT19 which submitted new versions this year show improvements.

2019

pdf bib
Train, Sort, Explain: Learning to Diagnose Translation Models
Robert Schwarzenberg | David Harbecke | Vivien Macketanz | Eleftherios Avramidis | Sebastian Möller
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.

pdf bib
Linguistic Evaluation of German-English Machine Translation Using a Test Suite
Eleftherios Avramidis | Vivien Macketanz | Ursula Strohriegel | Hans Uszkoreit
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

We present the results of the application of a grammatical test suite for German-to-English MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been a improvement of function words, non verbal agreement and punctuation. More detailed conclusions about particular systems and phenomena are also presented.

2018

pdf bib
Fine-grained evaluation of Quality Estimation for Machine translation based on a linguistically motivated Test Suite
Eleftherios Avramidis | Vivien Macketanz | Arle Lommel | Hans Uszkoreit
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib
Fine-grained evaluation of German-English Machine Translation based on a Test Suite
Vivien Macketanz | Eleftherios Avramidis | Aljoscha Burchardt | Hans Uszkoreit
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

2017

pdf bib
Sentence-level quality estimation by predicting HTER as a multi-component metric
Eleftherios Avramidis
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Tools and Guidelines for Principled Machine Translation Development
Nora Aranberri | Eleftherios Avramidis | Aljoscha Burchardt | Ondřej Klejch | Martin Popel | Maja Popović
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods. Our aim is to assess, compare and improve MT system variants. We hereby report on novel tools and practices that support various measures, developed in order to support a principled and informed approach of MT development. Our toolkit for automatic evaluation showcases quick and detailed comparison of MT system variants through automatic metrics and n-gram feedback, along with manual evaluation via edit-distance, error annotation and task-based feedback.

pdf bib
DFKI’s system for WMT16 IT-domain task, including analysis of systematic errors
Eleftherios Avramidis | Aljoscha Burchardt | Vivien Macketanz | Ankit Srivastava
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Deeper Machine Translation and Evaluation for German
Eleftherios Avramidis | Vivien Macketanz | Aljoscha Burchardt | Jindrich Helcl | Hans Uszkoreit
Proceedings of the 2nd Deep Machine Translation Workshop

2015

pdf bib
DFKI’s experimental hybrid MT system for WMT 2015
Eleftherios Avramidis | Maja Popović | Aljoscha Burchardt
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popović | Mihael Arčan | Eleftherios Avramidis | Aljoscha Burchardt | Arle Lommel
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Towards Deeper MT - A Hybrid System for German
Eleftherios Avramidis | Aljoscha Burchardt | Maja Popović | Hans Uszkoreit
Proceedings of the 1st Deep Machine Translation Workshop

pdf bib
Poor man’s lemmatisation for automatic error classification
Maja Popovic | Mihael Arcan | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
Efforts on Machine Learning over Human-mediated Translation Edit Rate
Eleftherios Avramidis
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Correlating decoding events with errors in Statistical Machine Translation
Eleftherios Avramidis | Maja Popović
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
The tara corpus of human-annotated machine translations
Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Maja Popović | Cindy Tscherwinka | David Vilar | Hans Uszkoreit
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Human translators are the key to evaluating machine translation (MT) quality and also to addressing the so far unanswered question when and how to use MT in professional translation workflows. This paper describes the corpus developed as a result of a detailed large scale human evaluation consisting of three tightly connected tasks: ranking, error classification and post-editing.

pdf bib
Using a new analytic measure for the annotation and analysis of MT errors on real data
Arle Lommel | Aljoscha Burchardt | Maja Popović | Kim Harris | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

pdf bib
Relations between different types of post-editing operations, cognitive effort and temporal effort
Maja Popović | Arle Lommel | Aljoscha Burchardt | Eleftherios Avramidis | Hans Uszkoreit
Proceedings of the 17th Annual conference of the European Association for Machine Translation

2013

pdf bib
Selecting Feature Sets for Comparative and Time-Oriented Quality Estimation of Machine Translation Output
Eleftherios Avramidis | Maja Popović
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
A CCG-based Quality Estimation Metric for Statistical Machine Translation Learning from Human Judgments of Machine Translation Output
Maja Popovic | Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Sven Schmeier | Cindy Tscherwinka | David Vilar
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
Learning from Human Judgments of Machine Translation Output
Maja Popovic | Eleftherios Avramidis | Aljoscha Burchardt | Sabine Hunsicker | Sven Schmeier | Cindy Tscherwinka | David Vilar
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
What can we learn about the selection mechanism for post-editing?
Maja Popović | Eleftherios Avramidis | Aljoscha Burchardt | David Vilar | Hans Uszkoreit
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

2012

pdf bib
Quality estimation for Machine Translation output using linguistic analysis and decoding features
Eleftherios Avramidis
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Involving Language Professionals in the Evaluation of Machine Translation
Eleftherios Avramidis | Aljoscha Burchardt | Christian Federmann | Maja Popović | Cindy Tscherwinka | David Vilar
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Significant breakthroughs in machine translation only seem possible if human translators are taken into the loop. While automatic evaluation and scoring mechanisms such as BLEU have enabled the fast development of systems, it is not clear how systems can meet real-world (quality) requirements in industrial translation scenarios today. The taraXÜ project paves the way for wide usage of hybrid machine translation outputs through various feedback loops in system development. In a consortium of research and industry partners, the project integrates human translators into the development process for rating and post-editing of machine translation outputs thus collecting feedback for possible improvements.

pdf bib
A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
Eleftherios Avramidis | Marta R. Costa-jussà | Christian Federmann | Josef van Genabith | Maite Melero | Pavel Pecina
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approaches can be designed so that the resulting hybrid translations show an improvement over the individual “component” translations. As a first step towards achieving this objective we have developed a parallel corpus with source text and the corresponding translation output from a number of machine translation engines, annotated with metadata information, capturing aspects of the translation process performed by the different MT systems. This corpus aims to serve as a basic resource for further research on whether hybrid machine translation algorithms and system combination techniques can benefit from additional (linguistically motivated, decoding, and runtime) information provided by the different systems involved. In this paper, we describe the annotated corpus we have created. We provide an overview on the component MT systems and the XLIFF-based annotation format we have developed. We also report on first experiments with the ML4HMT corpus data.

pdf bib
The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation
Christian Federmann | Eleftherios Avramidis | Marta R. Costa-jussà | Josef van Genabith | Maite Melero | Pavel Pecina
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe the “Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation” (ML4HMT) which aims to foster research on improved system combination approaches for machine translation (MT). Participants of the challenge are requested to build hybrid translations by combining the output of several MT systems of different types. We first describe the ML4HMT corpus used in the shared task, then explain the XLIFF-based annotation format we have designed for it, and briefly summarize the participating systems. Using both automated metrics scores and extensive manual evaluation, we discuss the individual performance of the various systems. An interesting result from the shared task is the fact that we were able to observe different systems winning according to the automated metrics scores when compared to the results from the manual evaluation. We conclude by summarising the first edition of the challenge and by giving an outlook to future work.

pdf bib
Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs
Eleftherios Avramidis
Proceedings of COLING 2012

2011

pdf bib
Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features
Eleftherios Avramidis | Maja Popovic | David Vilar | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Evaluation without references: IBM1 scores as evaluation metrics
Maja Popović | David Vilar | Eleftherios Avramidis | Aljoscha Burchardt
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
DFKI’s SC and MT submissions to IWSLT 2011
David Vilar | Eleftherios Avramidis | Maja Popović | Sabine Hunsicker
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

We describe DFKI’s submission to the System Combination and Machine Translation tracks of the 2011 IWSLT Evaluation Campaign. We focus on a sentence selection mechanism which chooses the (hopefully) best sentence among a set of candidates. The rationale behind it is to take advantage of the strengths of each system, especially given an heterogeneous dataset like the one in this evaluation campaign, composed of TED Talks of very different topics. We focus on using features that correlate well with human judgement and, while our primary system still focus on optimizing the BLEU score on the development set, our goal is to move towards optimizing directly the correlation with human judgement. This kind of system is still under development and was used as a secondary submission.

2008

pdf bib
Enriching Morphologically Poor Languages for Statistical Machine Translation
Eleftherios Avramidis | Philipp Koehn
Proceedings of ACL-08: HLT