Tomasz Dwojak

2024

The output structure of database-like tables, consisting of values structured in horizontal rows and vertical columns identifiable by name, can cover a wide range of NLP tasks. Following this constatation, we propose a framework for text-to-table neural models applicable to problems such as extraction of line items, joint entity and relation extraction, or knowledge base population. The permutation-based decoder of our proposal is a generalized sequential method that comprehends information from all cells in the table. The training maximizes the expected log-likelihood for a table’s content across all random permutations of the factorization order. During the content inference, we exploit the model’s ability to generate cells in any order by searching over possible orderings to maximize the model’s confidence and avoid substantial error accumulation, which other sequential models are prone to. Experiments demonstrate a high practical value of the framework, which establishes state-of-the-art results on several challenging datasets, outperforming previous solutions by up to 15\\%.

2022

pdf bib abs

nEYron: Implementation and Deployment of an MT System for a Large Audit & Consulting Corporation
Artur Nowakowski | Krzysztof Jassem | Maciej Lison | Rafał Jaworski | Tomasz Dwojak | Karolina Wiater | Olga Posesor
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This paper reports on the implementation and deployment of an MT system in the Polish branch of EY Global Limited. The system supports standard CAT and MT functionalities such as translation memory fuzzy search, document translation and post-editing, and meets less common, customer-specific expectations. The deployment began in August 2018 with a Proof of Concept, and ended with the signing of the Final Version acceptance certificate in October 2021. We present the challenges that were faced during the deployment, particularly in relation to the security check and installation processes in the production environment.

2021

pdf bib abs

Adam Mickiewicz University’s English-Hausa Submissions to the WMT 2021 News Translation Task
Artur Nowakowski | Tomasz Dwojak
Proceedings of the Sixth Conference on Machine Translation

This paper presents the Adam Mickiewicz University’s (AMU) submissions to the WMT 2021 News Translation Task. The submissions focus on the English↔Hausa translation directions, which is a low-resource translation scenario between distant languages. Our approach involves thorough data cleaning, transfer learning using a high-resource language pair, iterative training, and utilization of monolingual data via back-translation. We experiment with NMT and PB-SMT approaches alike, using the base Transformer architecture for all of the NMT models while utilizing PB-SMT systems as comparable baseline solutions.

2020

pdf bib abs

From Dataset Recycling to Multi-Property Extraction and Beyond
Tomasz Dwojak | Michał Pietruszka | Łukasz Borchmann | Jakub Chłędowski | Filip Graliński
Proceedings of the 24th Conference on Computational Natural Language Learning

This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-the-art by a large margin. Next, we introduce WikiReading Recycled - a newly developed public dataset, and the task of multiple-property extraction. It uses the same data as WikiReading but does not inherit its predecessor’s identified disadvantages. In addition, we provide a human-annotated test set with diagnostic subsets for a detailed analysis of model performance.

2018

pdf bib abs

Fast Neural Machine Translation Implementation
Hieu Hoang | Tomasz Dwojak | Rihards Krislauks | Daniel Torregrosa | Kenneth Heafield
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

This paper describes the submissions to the efficiency track for GPUs at the Workshop for Neural Machine Translation and Generation by members of the University of Edinburgh, Adam Mickiewicz University, Tilde and University of Alicante. We focus on efficient implementation of the recurrent deep-learning model as implemented in Amun, the fast inference engine for neural machine translation. We improve the performance with an efficient mini-batching algorithm, and by fusing the softmax operation with the k-best extraction algorithm. Submissions using Amun were first, second and third fastest in the GPU efficiency track.

pdf bib abs

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

2017

pdf bib

pdf bib abs

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring. The platform contains a rich suite of low-level and high-level natural language processing technologies: automatic speech recognition of broadcast media, machine translation, automated tagging and classification of named entities, semantic parsing to detect relationships between entities, and automatic construction / augmentation of factual knowledge bases. Implemented on the Docker platform, it can easily be deployed, customised, and scaled to large volumes of incoming media streams.

2016

pdf bib abs

Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions
Marcin Junczys-Dowmunt | Tomasz Dwojak | Hieu Hoang
Proceedings of the 13th International Conference on Spoken Language Translation

In this paper we provide the largest published comparison of translation quality for phrase-based SMT and neural machine translation across 30 translation directions. For ten directions we also include hierarchical phrase-based MT. Experiments are performed for the recently published United Nations Parallel Corpus v1.0 and its large six-way sentence-aligned subcorpus. In the second part of the paper we investigate aspects of translation speed, introducing AmuNMT, our efficient neural machine translation decoder. We demonstrate that current neural machine translation could already be used for in-production systems when comparing words-persecond ratios.

pdf bib

The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT
Marcin Junczys-Dowmunt | Tomasz Dwojak | Rico Sennrich
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers