David Kolovratnik

Also published as: David Kolovratník


pdf bib
eTranslation’s Submissions to the WMT 2021 News Translation Task
Csaba Oravecz | Katina Bontcheva | David Kolovratník | Bhavani Bhaskar | Michael Jellinghaus | Andreas Eisele
Proceedings of the Sixth Conference on Machine Translation

The paper describes the 3 NMT models submitted by the eTranslation team to the WMT 2021 news translation shared task. We developed systems in language pairs that are actively used in the European Commission’s eTranslation service. In the WMT news task, recent years have seen a steady increase in the need for computational resources to train deep and complex architectures to produce competitive systems. We took a different approach and explored alternative strategies focusing on data selection and filtering to improve the performance of baseline systems. In the domain constrained task for the French–German language pair our approach resulted in the best system by a significant margin in BLEU. For the other two systems (English–German and English-Czech) we tried to build competitive models using standard best practices.


pdf bib
eTranslation’s Submissions to the WMT 2020 News Translation Task
Csaba Oravecz | Katina Bontcheva | László Tihanyi | David Kolovratnik | Bhavani Bhaskar | Adrien Lardilleux | Szymon Klocek | Andreas Eisele
Proceedings of the Fifth Conference on Machine Translation

The paper describes the submissions of the eTranslation team to the WMT 2020 news translation shared task. Leveraging the experience from the team’s participation last year we developed systems for 5 language pairs with various strategies. Compared to last year, for some language pairs we dedicated a lot more resources to training, and tried to follow standard best practices to build competitive systems which can achieve good results in the rankings. By using deep and complex architectures we sacrificed direct re-usability of our systems in production environments but evaluation showed that this approach could result in better models that significantly outperform baseline architectures. We submitted two systems to the zero shot robustness task. These submissions are described briefly in this paper as well.


pdf bib
DCEP -Digital Corpus of the European Parliament
Najeh Hajlaoui | David Kolovratnik | Jaakko Väyrynen | Ralf Steinberger | Daniel Varga
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We are presenting a new highly multilingual document-aligned parallel corpus called DCEP - Digital Corpus of the European Parliament. It consists of various document types covering a wide range of subject domains. With a total of 1.37 billion words in 23 languages (253 language pairs), gathered in the course of ten years, this is the largest single release of documents by a European Union institution. DCEP contains most of the content of the European Parliament’s official Website. It includes different document types produced between 2001 and 2012, excluding only the documents already exist in the Europarl corpus to avoid overlapping. We are presenting the typical acquisition steps of the DCEP corpus: data access, document alignment, sentence splitting, normalisation and tokenisation, and sentence alignment efforts. The sentence-level alignment is still in progress but based on some first experiments; we showed that DCEP is very useful for NLP applications, in particular for Statistical Machine Translation.


pdf bib
To post-edit or not to post-edit? Estimating the benefits of MT post-editing for a European organization
Alexandros Poulis | David Kolovratnik
Workshop on Post-Editing Technology and Practice

In the last few years the European Parliament has witnessed a significant increase in translation demand. Although Translation Memory (TM) tools, terminology databases and bilingual concordancers have provided significant leverage in terms of quality and productivity the European Parliament is in need for advanced language technology to keep facing successfully the challenge of multilingualism. This paper describes an ongoing large-scale machine translation post-editing evaluation campaign the purpose of which is to estimate the business benefits from the use of machine translation for the European Parliament. This paper focuses mainly on the design, the methodology and the tools used by the evaluators but it also presents some preliminary results for the following language pairs: Polish-English, Danish-English, Lithuanian-English, English-German and English-French.


pdf bib
Exodus - Exploring SMT for EU Institutions
Michael Jellinghaus | Alexandros Poulis | David Kolovratník
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR