Roman Grundkiewicz - ACL Anthology

Roman Grundkiewicz

2025

This paper presents the results of the General Machine Translation Task organized as part of the 2025 Conference on Machine Translation (WMT). Participants were invited to build systems for any of 30 language pairs. For half of these pairs, we conducted a human evaluation on test sets spanning four to five different domains.We evaluated 60 systems in total: 36 submitted by participants and 24 for which we collected translations from large language models (LLMs) and popular online translation providers.This year, we focused on creating challenging test sets by developing a difficulty sampling technique and using more complex source data. We evaluated system outputs with professional annotators using the Error Span Annotation (ESA) protocol, except for two language pairs, for which we used Multidimensional Quality Metrics (MQM) instead.We continued the trend of increasingly moving towards document-level translation, providing the source texts as whole documents containing multiple paragraphs.

The WMT25 Multilingual Instruction Shared Task (MIST) introduces a benchmark to evaluate large language models (LLMs) across 30 languages. The benchmark covers five types of problems: machine translation, linguistic reasoning, open-ended generation, cross-lingual summarization, and LLM-as-a-judge.We provide automatic evaluation and collect human annotations, which highlight the limitations of automatic evaluation and allow further research into metric meta-evaluation. We run on our benchmark a diverse set of open- and closed-weight LLMs, providing a broad assessment of the multilingual capabilities of current LLMs. Results highlight substantial variation across sub-tasks and languages, revealing persistent challenges in reasoning, cross-lingual generation, and evaluation reliability. This work establishes a standardized framework for measuring future progress in multilingual LLM development.

Findings of the WMT 2025 Shared Task on Model Compression: Early Insights on Compressing LLMs for Machine Translation
Marco Gaido | Roman Grundkiewicz | Thamme Gowda | Matteo Negri
Proceedings of the Tenth Conference on Machine Translation

We present the results of the first edition of the Model Compression shared task, organized as part of the 10th Conference on Machine Translation (WMT25). The task challenged participants to compress Large Language Models (LLMs) toward enabling practical deployment in resource-constrained scenarios, while minimizing loss in translation performance. In this edition, participants could choose to compete in either a constrained track, which required compressing a specific model (Aya Expanse 8B) evaluated on a limited set of language pairs (Czech→German, Japanese→Chinese, and English→Arabic), or an unconstrained track, which placed no restrictions on the model and allowed submissions for any of the 15 language directions covered by the General MT task (GenMT). We received 12 submissions from three teams, all in the constrained track. They proposed different compression solutions and covered various language combinations: all targeted Czech→German, while one covered all the language pairs. Evaluation was conducted separately for each language, measuring translation quality using COMET and MetricX, model size, and inference speed on an Nvidia A100 GPU.

2024

On Instruction-Finetuning Neural Machine Translation Models
Vikas Raunak | Roman Grundkiewicz | Marcin Junczys-Dowmunt
Proceedings of the Ninth Conference on Machine Translation

In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models. Our instruction-finetuning recipe for NMT models enables customization of translations for a limited but disparate set of translation-specific tasks.We show that NMT models are capable of following multiple instructions simultaneously and demonstrate capabilities of zero-shot composition of instructions.We also show that through instruction finetuning, traditionally disparate tasks such as formality-controlled machine translation, multi-domain adaptation as well as multi-modal translations can be tackled jointly by a single instruction finetuned NMT model, at a performance level comparable to LLMs such as GPT-3.5-Turbo.To the best of our knowledge, our work is among the first to demonstrate the instruction-following capabilities of traditional NMT models, which allows for faster, cheaper and more efficient serving of customized translations.

PyMarian: Fast Neural Machine Translation and Evaluation in Python
Thamme Gowda | Roman Grundkiewicz | Elijah Rippeth | Matt Post | Marcin Junczys-Dowmunt
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Tom Kocmi | Vilém Zouhar | Eleftherios Avramidis | Roman Grundkiewicz | Marzena Karpinska | Maja Popović | Mrinmaya Sachan | Mariya Shmatova
Proceedings of the Ninth Conference on Machine Translation

High-quality Machine Translation (MT) evaluation relies heavily on human judgments.Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages.On the other hand, just assigning overall scores, like Direct Assessment (DA), is simpler and faster and can be done by translators of any level, but is less reliable.In this paper, we introduce Error Span Annotation (ESA), a human evaluation protocol which combines the continuous rating of DA with the high-level error severity span marking of MQM.We validate ESA by comparing it to MQM and DA for 12 MT systems and one human reference translation (English to German) from WMT23. The results show that ESA offers faster and cheaper annotations than MQM at the same quality level, without the requirement of expensive MQM experts.

This overview paper presents the results of the General Machine Translation Task organised as part of the 2024 Conference on Machine Translation (WMT). In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of three to five different domains. In addition to participating systems, we collected translations from 8 different large language models (LLMs) and 4 online translation providers. We evaluate system outputs with professional human annotators using a new protocol called Error Span Annotations (ESA).

2023

This paper is a brief summary of the First WMT Shared Task on Sign Language Translation (WMT-SLT22), a project partly funded by EAMT. The focus of this shared task is automatic translation between signed and spoken languages. Details can be found on our website (https://www.wmt-slt.com/) or in the findings paper (Müller et al., 2022).

Perplexity-Driven Case Encoding Needs Augmentation for CAPITALIZATION Robustness
Rohit Jain | Huda Khayrallah | Roman Grundkiewicz | Marcin Junczys-Dowmunt
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper presents the results of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23; https://www.wmt-slt.com/). This shared task is concerned with automatic translation between signed and spoken languages. The task is unusual in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT). The task offers four tracks involving the following languages: Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), Italian Sign Language of Switzerland (LIS-CH), German, French and Italian. Four teams (including one working on a baseline submission) participated in this second edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora and reproducible baseline systems. Finally, the task also resulted in publicly available sets of system outputs and more human evaluation scores for sign language translation.

This paper presents the results of the General Machine Translation Task organised as part of the 2023 Conference on Machine Translation (WMT). In the general MT task, participants were asked to build machine translation systems for any of 8 language pairs (corresponding to 14 translation directions), to be evaluated on test sets consisting of up to four different domains. We evaluate system outputs with professional human annotators using a combination of source-based Direct Assessment and scalar quality metric (DA+SQM).

SOTASTREAM: A Streaming Approach to Machine Translation Training
Matt Post | Thamme Gowda | Roman Grundkiewicz | Huda Khayrallah | Rohit Jain | Marcin Junczys-Dowmunt
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer. This preparation step is increasingly at odds with modern research and development practices because this process produces a static, unchangeable version of the training data, making common training-time needs difficult (e.g., subword sampling), time-consuming (preprocessing with large data can take days), expensive (e.g., disk space), and cumbersome (managing experiment combinatorics). We propose an alternative approach that separates the generation of data from the consumption of that data. In this approach, there is no separate pre-processing step; data generation produces an infinite stream of permutations of the raw training data, which the trainer tensorizes and batches as it is consumed. Additionally, this data stream can be manipulated by a set of user-definable operators that provide on-the-fly modifications, such as data normalization, augmentation or filtering. We release an open-source toolkit, SOTASTREAM, that implements this approach: https://github.com/marian-nmt/sotastream. We show that it cuts training time, adds flexibility, reduces experiment management complexity, and reduces disk space, all without affecting the accuracy of the trained models.

2022

This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22).This shared task is concerned with automatic translation between signed and spoken languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT).The task featured two tracks, translating from Swiss German Sign Language (DSGS) to German and vice versa. Seven teams participated in this first edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora, reproducible baseline systems and new protocols and software for human evaluation. Finally, the task also resulted in the first publicly available set of system outputs and human evaluation scores for sign language translation.

This paper presents the results of the General Machine Translation Task organised as part of the Conference on Machine Translation (WMT) 2022. In the general MT task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting of four different domains. We evaluate system outputs with human annotators using two different techniques: reference-based direct assessment and (DA) and a combination of DA and scalar quality metric (DA+SQM).

Revisiting Locality Sensitive Hashing for Vocabulary Selection in Fast Neural Machine Translation
Hieu Hoang | Marcin Junczys-dowmunt | Roman Grundkiewicz | Huda Khayrallah
Proceedings of the Seventh Conference on Machine Translation (WMT)

Neural machine translation models often contain large target vocabularies. The calculation of logits, softmax and beam search is computationally costly over so many classes. We investigate the use of locality sensitive hashing (LSH) to reduce the number of vocabulary items that must be evaluated and explore the relationship between the hashing algorithm, translation speed and quality. Compared to prior work, our LSH-based solution does not require additional augmentation via word-frequency lists or alignments. We propose a training procedure that produces models, which, when combined with our LSH inference algorithm increase translation speed by up to 87% over the baseline, while maintaining translation quality as measured by BLEU. Apart from just using BLEU, we focus on minimizing search errors compared to the full softmax, a much harsher quality criterion.

Findings of the IWSLT 2022 Evaluation Campaign
Antonios Anastasopoulos | Loïc Barrault | Luisa Bentivogli | Marcely Zanon Boito | Ondřej Bojar | Roldano Cattoni | Anna Currey | Georgiana Dinu | Kevin Duh | Maha Elbayad | Clara Emmanuel | Yannick Estève | Marcello Federico | Christian Federmann | Souhir Gahbiche | Hongyu Gong | Roman Grundkiewicz | Barry Haddow | Benjamin Hsu | Dávid Javorský | Vĕra Kloudová | Surafel Lakew | Xutai Ma | Prashant Mathur | Paul McNamee | Kenton Murray | Maria Nǎdejde | Satoshi Nakamura | Matteo Negri | Jan Niehues | Xing Niu | John Ortega | Juan Pino | Elizabeth Salesky | Jiatong Shi | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Yogesh Virkar | Alexander Waibel | Changhan Wang | Shinji Watanabe
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved.

2021

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation
Tom Kocmi | Christian Federmann | Roman Grundkiewicz | Marcin Junczys-Dowmunt | Hitokazu Matsushita | Arul Menezes
Proceedings of the Sixth Conference on Machine Translation

Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system’s quality over another. The community choice of automatic metric guides research directions and industrial developments by deciding which models are deemed better. Evaluating metrics correlations with sets of human judgements has been limited by the size of these sets. In this paper, we corroborate how reliable metrics are in contrast to human judgements on – to the best of our knowledge – the largest collection of judgements reported in the literature. Arguably, pairwise rankings of two systems are the most common evaluation tasks in research or deployment scenarios. Taking human judgement as a gold standard, we investigate which metrics have the highest accuracy in predicting translation quality rankings for such system pairs. Furthermore, we evaluate the performance of various metrics across different language pairs and domains. Lastly, we show that the sole use of BLEU impeded the development of improved models leading to bad deployment decisions. We release the collection of 2.3M sentence-level human judgements for 4380 systems for further analysis and replication of our work.

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021.In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories. The taskwas also opened up to additional test suites toprobe specific aspects of translation.

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs
Roman Grundkiewicz | Marcin Junczys-Dowmunt | Christian Federmann | Tom Kocmi
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this work, we compare human assessment data from the last two WMT evaluation campaigns collected via two different methods for document-level evaluation. Our analysis shows that a document-centric approach to evaluation where the annotator is presented with the entire document context on a screen leads to higher quality segment and document level assessments. It improves the correlation between segment and document scores and increases inter-annotator agreement for document scores but is considerably more time consuming for annotators.

Efficient Machine Translation with Model Pruning and Quantization
Maximiliana Behnke | Nikolay Bogoychev | Alham Fikri Aji | Kenneth Heafield | Graeme Nail | Qianqian Zhu | Svetlana Tchistiakova | Jelmer van der Linde | Pinzhen Chen | Sidharth Kashyap | Roman Grundkiewicz
Proceedings of the Sixth Conference on Machine Translation

We participated in all tracks of the WMT 2021 efficient machine translation task: single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions combine several efficiency strategies: knowledge distillation, a simpler simple recurrent unit (SSRU) decoder with one or two layers, lexical shortlists, smaller numerical formats, and pruning. For the CPU track, we used quantized 8-bit models. For the GPU track, we experimented with FP16 and 8-bit integers in tensorcores. Some of our submissions optimize for size via 4-bit log quantization and omitting a lexical shortlist. We have extended pruning to more parts of the network, emphasizing component- and block-level pruning that actually improves speed unlike coefficient-wise pruning.

Findings of the WMT 2021 Shared Task on Efficient Translation
Kenneth Heafield | Qianqian Zhu | Roman Grundkiewicz
Proceedings of the Sixth Conference on Machine Translation

The machine translation efficiency task challenges participants to make their systems faster and smaller with minimal impact on translation quality. How much quality to sacrifice for efficiency depends upon the application, so participants were encouraged to make multiple submissions covering the space of trade-offs. In total, there were 53 submissions by 4 teams. There were GPU, single-core CPU, and multi-core CPU hardware tracks as well as batched throughput or single-sentence latency conditions. Submissions showed hundreds of millions of words can be translated for a dollar, average latency is 5–17 ms, and models fit in 7.5–150 MB.

2020

This paper presents the results of the news translation task and the similar language translation task, both organised alongside the Conference on Machine Translation (WMT) 2020. In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories. The task was also opened up to additional test suites to probe specific aspects of translation. In the similar language translation task, participants built machine translation systems for translating between closely related pairs of languages.

A Crash Course in Automatic Grammatical Error Correction
Roman Grundkiewicz | Christopher Bryant | Mariano Felice
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting all types of errors in written text. Although most research has focused on correcting errors in the context of English as a Second Language (ESL), GEC can also be applied to other languages and native text. The main application of a GEC system is thus to assist humans with their writing. Academic and commercial interest in GEC has grown significantly since the Helping Our Own (HOO) and Conference on Natural Language Learning (CoNLL) shared tasks in 2011-14, and a record-breaking 24 teams took part in the recent Building Educational Applications (BEA) shared task. Given this interest, and the recent shift towards neural approaches, we believe the time is right to offer a tutorial on GEC for researchers who may be new to the field or who are interested in the current state of the art and future challenges. With this in mind, the main goal of this tutorial is not only to bring attendees up to speed with GEC in general, but also examine the development of neural-based GEC systems.

Speed-optimized, Compact Student Models that Distill Knowledge from a Larger Teacher Model: the UEDIN-CUNI Submission to the WMT 2020 News Translation Task
Ulrich Germann | Roman Grundkiewicz | Martin Popel | Radina Dobreva | Nikolay Bogoychev | Kenneth Heafield
Proceedings of the Fifth Conference on Machine Translation

We describe the joint submission of the University of Edinburgh and Charles University, Prague, to the Czech/English track in the WMT 2020 Shared Task on News Translation. Our fast and compact student models distill knowledge from a larger, slower teacher. They are designed to offer a good trade-off between translation quality and inference efficiency. On the WMT 2020 Czech ↔ English test sets, they achieve translation speeds of over 700 whitespace-delimited source words per second on a single CPU thread, thus making neural translation feasible on consumer hardware without a GPU.

Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task
Nikolay Bogoychev | Roman Grundkiewicz | Alham Fikri Aji | Maximiliana Behnke | Kenneth Heafield | Sidharth Kashyap | Emmanouil-Ioannis Farsarakis | Mateusz Chudyk
Proceedings of the Fourth Workshop on Neural Generation and Translation

We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.

2019

Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data
Roman Grundkiewicz | Marcin Junczys-Dowmunt | Kenneth Heafield
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

Considerable effort has been made to address the data sparsity problem in neural grammatical error correction. In this work, we propose a simple and surprisingly effective unsupervised synthetic error generation method based on confusion sets extracted from a spellchecker to increase the amount of training data. Synthetic data is used to pre-train a Transformer sequence-to-sequence model, which not only improves over a strong baseline trained on authentic error-annotated data, but also enables the development of a practical GEC system in a scenario where little genuine error-annotated data is available. The developed systems placed first in the BEA19 shared task, achieving 69.47 and 64.24 F_0.5 in the restricted and low-resource tracks respectively, both on the W&I+LOCNESS test set. On the popular CoNLL 2014 test set, we report state-of-the-art results of 64.16 M² for the submitted system, and 61.30 M² for the constrained system trained on the NUCLE and Lang-8 data.

From Research to Production and Back: Ludicrously Fast Neural Machine Translation
Young Jin Kim | Marcin Junczys-Dowmunt | Hany Hassan | Alham Fikri Aji | Kenneth Heafield | Roman Grundkiewicz | Nikolay Bogoychev
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper describes the submissions of the “Marian” team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year’s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year’s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively.

The University of Edinburgh’s Submissions to the WMT19 News Translation Task
Rachel Bawden | Nikolay Bogoychev | Ulrich Germann | Roman Grundkiewicz | Faheem Kirefu | Antonio Valerio Miceli Barone | Alexandra Birch
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English↔Gujarati, English↔Chinese, German→English, and English→Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English↔Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German→English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English→Czech, we compared different preprocessing and tokenisation regimes.

Minimally-Augmented Grammatical Error Correction
Roman Grundkiewicz | Marcin Junczys-Dowmunt
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique

Samsung and University of Edinburgh’s System for the IWSLT 2019
Joanna Wetesko | Marcin Chochowski | Pawel Przybysz | Philip Williams | Roman Grundkiewicz | Rico Sennrich | Barry Haddow | Barone | Valerio Miceli | Alexandra Birch
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes the joint submission to the IWSLT 2019 English to Czech task by Samsung RD Institute, Poland, and the University of Edinburgh. Our submission was ultimately produced by combining four Transformer systems through a mixture of ensembling and reranking.

2018

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation
Roman Grundkiewicz | Marcin Junczys-Dowmunt
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT). The hybrid system achieves new state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC system preserves the accuracy of SMT output and, at the same time, generates more fluent sentences as it typical for NMT. Our analysis shows that the created systems are closer to reaching human-level performance than any other GEC system reported so far.

Neural Machine Translation Techniques for Named Entity Transliteration
Roman Grundkiewicz | Kenneth Heafield
Proceedings of the Seventh Named Entities Workshop

Transliterating named entities from one language into another can be approached as neural machine translation (NMT) problem, for which we use deep attentional RNN encoder-decoder models. To build a strong transliteration system, we apply well-established techniques from NMT, such as dropout regularization, model ensembling, rescoring with right-to-left models, and back-translation. Our submission to the NEWS 2018 Shared Task on Named Entity Transliteration ranked first in several tracks.

MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018. Based on training data and systems from the WMT2017 shared task, we re-implement our own models from the last shared task and introduce improvements based on extensive parameter sharing. Next we experiment with our implementation of dual-source transformer models and data selection for the IT domain. Our submissions decisively wins the SMT post-editing sub-task establishing the new state-of-the-art and is a very close second (or equal, 16.46 vs 16.50 TER) in the NMT sub-task. Based on the rather weak results in the NMT sub-task, we hypothesize that neural-on-neural APE might not be actually useful.

Marian: Fast Neural Machine Translation in C++
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Tomasz Dwojak | Hieu Hoang | Kenneth Heafield | Tom Neckermann | Frank Seide | Ulrich Germann | Alham Fikri Aji | Nikolay Bogoychev | André F. T. Martins | Alexandra Birch
Proceedings of ACL 2018, System Demonstrations

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

Marian: Cost-effective High-Quality Neural Machine Translation in C++
Marcin Junczys-Dowmunt | Kenneth Heafield | Hieu Hoang | Roman Grundkiewicz | Anthony Aue
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

This paper describes the submissions of the “Marian” team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we create a number of high-quality, high-performance models on the GPU and CPU, dominating the Pareto frontier for this shared task.

The University of Edinburgh’s Submissions to the WMT18 News Translation Task
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN/Transformer ensembles, data selection and weighting, and extensions to back-translation.

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Shubha Guha | Kenneth Heafield
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines. We demonstrate parallels between neural GEC and low-resource neural MT and successfully adapt several methods from low-resource MT to neural GEC. We further establish guidelines for trustable results in neural GEC and propose a set of model-independent methods for neural GEC that can be easily applied in most GEC settings. Proposed methods include adding source-side noise, domain-adaptation techniques, a GEC-specific training-objective, transfer learning with monolingual data, and ensembling of independently trained GEC models and language models. The combined effects of these methods result in better than state-of-the-art neural GEC models that outperform previously best neural GEC systems by more than 10% M² on the CoNLL-2014 benchmark and 5.9% on the JFLEG test set. Non-neural state-of-the-art systems are outperformed by more than 2% on the CoNLL-2014 benchmark and by 4% on JFLEG.

2017

An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output. We focus on neural end-to-end models that combine both inputs mt (raw MT output) and src (source language input) in a single neural architecture, modeling {mt, src} → pe directly. Apart from that, we investigate the influence of hard-attention models which seem to be well-suited for monolingual tasks, as well as combinations of both ideas. We report results on data sets provided during the WMT-2016 shared task on automatic post-editing and can demonstrate that dual-attention models that incorporate all available data in the APE scenario in a single model improve on the best shared task system and on all other published results after the shared task. Dual-attention models that are combined with hard attention remain competitive despite applying fewer changes to the input.

Pushing the Limits of Translation Quality Estimation
André F. T. Martins | Marcin Junczys-Dowmunt | Fabio N. Kepler | Ramón Astudillo | Chris Hokamp | Roman Grundkiewicz
Transactions of the Association for Computational Linguistics, Volume 5

Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16: a word-level FMULT1 score of 57.47% (an absolute gain of +7.95% over the current state of the art), and a Pearson correlation score of 65.56% for sentence-level HTER prediction (an absolute gain of +13.36%).

2016

Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

Human Evaluation of Grammatical Error Correction Systems
Roman Grundkiewicz | Marcin Junczys-Dowmunt | Edward Gillian
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

2011

How to Distinguish a Kidney Theft from a Death Car? Experiments in Clustering Urban-Legend Texts
Roman Grundkiewicz | Filip Graliński
Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition

Co-authors

Ondřej Bojar 9

Philipp Koehn 9

Eleftherios Avramidis 8

Christof Monz 8

Nikolay Bogoychev 7

Markus Freitag 7

Masaaki Nagata 7

Makoto Morishita 6

Rachel Bawden 5

Yvette Graham 5

Toshiaki Nakazawa 5

Maja Popović 5

Mariya Shmatova 5

Alham Fikri Aji 4

Loic Barrault 4

Marta R. Costa-jussà 4

Anton Dvorkovich 4

Cristina España-Bonet 4

Ulrich Germann 4

Matthias Huck 4

André F. T. Martins 4

Kenton Murray 4

Vilém Zouhar 4

Alexandra Birch 3

Richard Bowden 3

Annelies Braffort 3

Necati Cihan Camgöz 3

Rajen Chatterjee 3

Marzena Karpinska 3

Huda Khayrallah 3

Amit Moryossef 3

Mathias Müller 3

Annette Rios Gonzales 3

Dimitar Shterionov 3

Sandra Sidler-Miserez 3

Marcos Zampieri 3

Ekaterina Artemova 2

Alessia Battisti 2

Maximiliana Behnke 2

Michèle Berger 2

Magdalena Biesialska 2

Fethi Bougares 2

Alexander Fraser 2

Antonio Jimeno Yepes 2

Sidharth Kashyap 2

Davy Van Landuyt 2

Benjamin Marie 2

Antonio Valerio Miceli-Barone 2

Stefano Perrella 2

Regula Perrollaz 2

Lorenzo Proietti 2

Sabine Reinhard 2

Rico Sennrich 2

Steinþór Steingrímsson 2

Sweta Agrawal 1

Farhad Akhbardeh 1

Malihe Alikhani 1

Kwabena Amponsah-Kaakyire 1

Antonios Anastasopoulos 1

Arkady Arkhangorodsky 1

Luisa Bentivogli 1

Eleftheria Briakou 1

Christopher Bryant 1

Roldano Cattoni 1

Vishrav Chaudhary 1

Marcin Chochowski 1

Mateusz Chudyk 1

Georgiana Dinu 1

Radina Dobreva 1

Konstantin Dranch 1

Sergey Dukanov 1

Tomasz Dwojak 1

Clara Emmanuel 1

Yannick Estève 1

Marzieh Fadaee 1

Emmanouil-Ioannis Farsarakis 1

Marcello Federico 1

Mariano Felice 1

Ramón Fernandez Astudillo 1

Souhir Gahbiche 1

Edward Gillian 1

Filip Gralinski 1

Anne Göhring 1

Leonie Harter 1

Hany Hassan Awadalla 1

Christopher Homan 1

Dávid Javorský 1

Daniel Khashabi 1

Young Jin Kim 1

Faheem Kirefu 1

Věra Kloudová 1

Rebecca Knowles 1

Julia Kreutzer 1

Surafel Lakew 1

Howard Lakougna 1

Nikola Ljubešić 1

Nicholas Lourie 1

Jessica Lundin 1

Prashant Mathur 1

Hitokazu Matsushita 1

Valerio Miceli 1

Maria Nadejde 1

Satoshi Nakamura 1

Tom Neckermann 1

Aurelie Neveol 1

Mariana Neves 1

Michal Novák 1

Paweł Przybysz 1

Elijah Rippeth 1

Mrinmaya Sachan 1

Elizabeth Salesky 1

Patrícia Schmidtová 1

Matthias Sperber 1

Sebastian Stüker 1

Katsuhito Sudoh 1

Eduardo Sánchez 1

Allahsera Auguste Tapo 1

Svetlana Tchistiakova 1

Jelmer Van Der Linde 1

Yogesh Virkar 1

Valentin Vydrin 1

Changhan Wang 1

Shinji Watanabe 1

Joanna Wetesko 1

Philip Williams 1

Lisa Yankovskaya 1

Marcely Zanon Boito 1

Venues