Conference of the European Association for Machine Translation (2022)

Volumes

Proceedings of the 23rd Annual Conference of the European Association for Machine Translation 72 papers

pdf (full)
bib (full) Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

pdf bib
Rethinking the Design of Sequence-to-Sequence Models for Efficient Machine Translation
Maha Elbayad

pdf bib
Neural Speech Translation: From Neural Machine Translation to Direct Speech Translation
Mattia Antonino Di Gangi

pdf bib
Domain Adaptation for Neural Machine Translation
Danielle Saunders

pdf bib abs
Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies
Minh-Quang Pham | Josep Crego | François Yvon

Building effective Neural Machine Translation models often implies accommodating diverse sets of heterogeneous data so as to optimize performance for the domain(s) of interest. Such multi-source / multi-domain adaptation problems are typically approached through instance selection or reweighting strategies, based on a static assessment of the relevance of training instances with respect to the task at hand. In this paper, we study dynamic data selection strategies that are able to automatically re-evaluate the usefulness of data samples and to evolve a data selection policy in the course of training. Based on the results of multiple experiments, we show that such methods constitute a generic framework to automatically and effectively handle a variety of real-world situations, from multi-source domain adaptation to multi-domain learning and unsupervised domain adaptation.

pdf bib abs
The use of online translators by students not enrolled in a professional translation program: beyond copying and pasting for a professional use
Rudy Loock | Sophie Léchauguette | Benjamin Holt

In this paper, we discuss a use of machine translation (MT) that has been quite overlooked up to now, namely by students not enrolled in a professional translation program. A number of studies have reported massive use of free online translators (OTs), and it seems important to uncover such users’ abilities and difficulties when using MT output, whether to improve their understanding, writing, or translation skills. We report here a study on students enrolled in a French ‘applied languages program’ (where students study two languages, as well as law, economics, and management). The aim was to uncover how they use OTs, as well as their (in)ability to identify and correct MT errors. Obtained through two online surveys and several tests conducted with students from 2020 to 2022, our results show an unsurprising widespread use of OTs for many different tasks, but also some specific difficulties in identifying MT errors, in particular in relation to target language fluency.

pdf bib abs
Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario
Xabier Soto | Olatz Perez-De-Viñaspre | Gorka Labaka | Maite Oronoz

Back-translation is a well established approach to improve the performance of Neural Machine Translation (NMT) systems when large monolingual corpora of the target language and domain are available. Recently, diverse approaches have been proposed to get better automatic evaluation results of NMT models using back-translation, including the use of sampling instead of beam search as decoding algorithm for creating the synthetic corpus. Alternatively, it has been proposed to append a tag to the back-translated corpus for helping the NMT system to distinguish the synthetic bilingual corpus from the authentic one. However, not all the combinations of the previous approaches have been tested, and thus it is not clear which is the best approach for developing a given NMT system. In this work, we empirically compare and combine existing techniques for back-translation in a real low resource setting: the translation of clinical notes from Basque into Spanish. Apart from automatically evaluating the MT systems, we ask bilingual healthcare workers to perform a human evaluation, and analyze the different synthetic corpora by measuring their lexical diversity (LD). For reproducibility and generalizability, we repeat our experiments for German to English translation using public data. The results suggest that in lower resource scenarios tagging only helps when using sampling for decoding, in contradiction with the previous literature using bigger corpora from the news domain. When fine-tuning with a few thousand bilingual in-domain sentences, one of our proposed method (tagged restricted sampling) obtains the best results both in terms of automatic and human evaluation. We will publish the code upon acceptance.

pdf bib abs
Passing Parser Uncertainty to the Transformer: Labeled Dependency Distributions for Neural Machine Translation
Dongqi Pu | Khalil Sima’an

Existing syntax-enriched neural machine translation (NMT) models work either with the single most-likely unlabeled parse or the set of n-best unlabeled parses coming out of an external parser. Passing a single or n-best parses to the NMT model risks propagating parse errors. Furthermore, unlabeled parses represent only syntactic groupings without their linguistically relevant categories. In this paper we explore the question: Does passing both parser uncertainty and labeled syntactic knowledge to the Transformer improve its translation performance? This paper contributes a novel method for infusing the whole labeled dependency distributions (LDD) of the source sentence’s dependency forest into the self-attention mechanism of the encoder of the Transformer. A range of experimental results on three language pairs demonstrate that the proposed approach outperforms both the vanilla Transformer as well as the single best-parse Transformer model across several evaluation metrics.

pdf bib abs
How well do real-time machine translation apps perform in practice? Insights from a literature review
Mark Pluymaekers

Although more and more professionals are using real-time machine translation during dialogues with interlocutors who speak a different language, the performance of real-time MT apps has received only limited attention in the academic literature. This study summarizes the findings of prior studies (N = 34) reporting an evaluation of one or more real-time MT apps in a professional setting. Our findings show that real-time MT apps are often tested in realistic circumstances and that users are more frequently employed as judges of performance than professional translators. Furthermore, most studies report overall positive results with regard to performance, particularly when apps are tested in real-life situations.

In recent years, several neural fine-tuned machine translation evaluation metrics such as COMET and BLEURT have been proposed. These metrics achieve much higher correlations with human judgments than lexical overlap metrics at the cost of computational efficiency and simplicity, limiting their applications to scenarios in which one has to score thousands of translation hypothesis (e.g. scoring multiple systems or Minimum Bayes Risk decoding). In this paper, we explore optimization techniques, pruning, and knowledge distillation to create more compact and faster COMET versions. Our results show that just by optimizing the code through the use of caching and length batching we can reduce inference time between 39% and 65% when scoring multiple systems. Also, we show that pruning COMET can lead to a 21% model reduction without affecting the model’s accuracy beyond 0.01 Kendall tau correlation. Furthermore, we present DISTIL-COMET a lightweight distilled version that is 80% smaller and 2.128x faster while attaining a performance close to the original model and above strong baselines such as BERTSCORE and PRISM.

pdf bib abs
Studying Post-Editese in a Professional Context: A Pilot Study
Lise Volkart | Pierrette Bouillon

The past few years have seen the multiplication of studies on post-editese, following the massive adoption of post-editing in professional translation workflows. These studies mainly rely on the comparison of post-edited machine translation and human translation on artificial parallel corpora. By contrast, we investigate here post-editese on comparable corpora of authentic translation jobs for the language direction English into French. We explore commonly used scores and also proposes the use of a novel metric. Our analysis shows that post-edited machine translation is not only lexically poorer than human translation, but also less dense and less varied in terms of translation solutions. It also tends to be more prolific than human translation for our language direction. Finally, our study highlights some of the challenges of working with comparable corpora in post-editese research.

Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful on the performance due to the gap between training objective and inference. In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework. Specifically, we propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions (left-to-right, right-to-left and straight) with a newly introduced direction variable, which works by controlling the prediction of each token to have specific dependencies under that direction. The unification achieved by direction successfully preserves the original dependency assumption used in AR and NAR, retaining both generalization and performance. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding, and is also competitive to the state-of-the-art independent AR and NAR models.

pdf bib abs
Multilingual Neural Machine Translation With the Right Amount of Sharing
Taido Purason | Andre Tättar

Large multilingual Transformer-based machine translation models have had a pivotal role in making translation systems available for hundreds of languages with good zero-shot translation performance. One such example is the universal model with shared encoder-decoder architecture. Additionally, jointly trained language-specific encoder-decoder systems have been proposed for multilingual neural machine translation (NMT) models. This work investigates various knowledge-sharing approaches on the encoder side while keeping the decoder language- or language-group-specific. We propose a novel approach, where we use universal, language-group-specific and language-specific modules to solve the shortcomings of both the universal models and models with language-specific encoders-decoders. Experiments on a multilingual dataset set up to model real-world scenarios, including zero-shot and low-resource translation, show that our proposed models achieve higher translation quality compared to purely universal and language-specific approaches.

pdf bib abs
Literary translation as a three-stage process: machine translation, post-editing and revision
Lieve Macken | Bram Vanroy | Luca Desmet | Arda Tezcan

This study focuses on English-Dutch literary translations that were created in a professional environment using an MT-enhanced workflow consisting of a three-stage process of automatic translation followed by post-editing and (mainly) monolingual revision. We compare the three successive versions of the target texts. We used different automatic metrics to measure the (dis)similarity between the consecutive versions and analyzed the linguistic characteristics of the three translation variants. Additionally, on a subset of 200 segments, we manually annotated all errors in the machine translation output and classified the different editing actions that were carried out. The results show that more editing occurred during revision than during post-editing and that the types of editing actions were different.

pdf bib abs
On the Interaction of Regularization Factors in Low-resource Neural Machine Translation
Àlex R. Atrio | Andrei Popescu-Belis

We explore the roles and interactions of the hyper-parameters governing regularization, and propose a range of values applicable to low-resource neural machine translation. We demonstrate that default or recommended values for high-resource settings are not optimal for low-resource ones, and that more aggressive regularization is needed when resources are scarce, in proportion to their scarcity. We explain our observations by the generalization abilities of sharp vs. flat basins in the loss landscape of a neural network. Results for four regularization factors corroborate our claim: batch size, learning rate, dropout rate, and gradient clipping. Moreover, we show that optimal results are obtained when using several of these factors, and that our findings generalize across datasets of different sizes and languages.

pdf bib abs
Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation
Sebastian T. Vincent | Loïc Barrault | Carolina Scarton

Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the underresearched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models.

pdf bib abs
Auxiliary Subword Segmentations as Related Languages for Low Resource Multilingual Translation
Nishant Kambhatla | Logan Born | Anoop Sarkar

We propose a novel technique that combines alternative subword tokenizations of a single source-target language pair that allows us to leverage multilingual neural translation training methods. These alternate segmentations function like related languages in multilingual translation. Overall this improves translation accuracy for low-resource languages and produces translations that are lexically diverse and morphologically rich. We also introduce a cross-teaching technique which yields further improvements in translation accuracy and cross-lingual transfer between high- and low-resource language pairs. Compared to other strong multilingual baselines, our approach yields average gains of +1.7 BLEU across the four low-resource datasets from the multilingual TED-talks dataset. Our technique does not require additional training data and is a drop-in improvement for any existing neural translation system.

pdf bib abs
Fast-Paced Improvements to Named Entity Handling for Neural Machine Translation
Pedro Mota | Vera Cabarrão | Eduardo Farah

In this work, we propose a Named Entity handling approach to improve translation quality within an existing Natural Language Processing (NLP) pipeline without modifying the Neural Machine Translation (NMT) component. Our approach seeks to enable fast delivery of such improvements and alleviate user experience problems related to NE distortion. We implement separate NE recognition and translation steps. Then, a combination of standard entity masking technique and a novel semantic equivalent placeholder guarantees that both NE translation is respected and the best overall quality is obtained from NMT. The experiments show that translation quality improves in 38.6% of the test cases when compared to a version of the NLP pipeline with less-developed NE handling capability.

pdf bib abs
Synthetic Data Generation for Multilingual Domain-Adaptable Question Answering Systems
Alina Kramchaninova | Arne Defauw

Deep learning models have significantly advanced the state of the art of question answering systems. However, the majority of datasets available for training such models have been annotated by humans, are open-domain, and are composed primarily in English. To deal with these limitations, we introduce a pipeline that creates synthetic data from natural text. To illustrate the domain-adaptability of our approach, as well as its multilingual potential, we use our pipeline to obtain synthetic data in English and Dutch. We combine the synthetic data with non-synthetic data (SQuAD 2.0) and evaluate multilingual BERT models on the question answering task. Models trained with synthetically augmented data demonstrate a clear improvement in performance when evaluated on the domain-specific test set, compared to the models trained exclusively on SQuAD 2.0. We expect our work to be beneficial for training domain-specific question-answering systems when the amount of available data is limited.

pdf bib abs
Automatic Discrimination of Human and Neural Machine Translation: A Study with Multiple Pre-Trained Models and Longer Context
Tobias van der Werff | Rik van Noord | Antonio Toral

We address the task of automatically distinguishing between human-translated (HT) and machine translated (MT) texts. Following recent work, we fine-tune pre-trained language models (LMs) to perform this task. Our work differs in that we use state-of-the-art pre-trained LMs, as well as the test sets of the WMT news shared tasks as training data, to ensure the sentences were not seen during training of the MT system itself. Moreover, we analyse performance for a number of different experimental setups, such as adding translationese data, going beyond the sentence-level and normalizing punctuation. We show that (i) choosing a state-of-the-art LM can make quite a difference: our best baseline system (DeBERTa) outperforms both BERT and RoBERTa by over 3% accuracy, (ii) adding translationese data is only beneficial if there is not much data available, (iii) considerable improvements can be obtained by classifying at the document-level and (iv) normalizing punctuation and thus avoiding (some) shortcuts has no impact on model performance.

pdf bib abs
A Taxonomy and Study of Critical Errors in Machine Translation
Khetam Al Sharou | Lucia Specia

Not all machine mistranslations are equal. For example, mistranslating a date or time in an appointment, mistranslating the number or currency in a contract, or hallucinating profanity may lead to consequences for the users even when MT is just used for gisting. The severity of the errors is important, but overlooked, aspect of MT quality evaluation. In this paper, we present the result of our effort to bring awareness to the problem of critical translation errors. We study, validate and improve an initial taxonomy of critical errors with the view of providing guidance for critical error analysis, annotation and mitigation. We test the taxonomy for three different languages to examine to what extent it generalises across languages. We provide an account of factors that affect annotation tasks along with recommendations on how to improve the practice in future work. We also study the impact of the source text on generating critical errors in the translation and, based on this, propose a set of recommendations on aspects of the MT that need further scrutiny, especially for user-generated content, to avoid generating such errors, and hence improve online communication.

This paper reports on the implementation and deployment of an MT system in the Polish branch of EY Global Limited. The system supports standard CAT and MT functionalities such as translation memory fuzzy search, document translation and post-editing, and meets less common, customer-specific expectations. The deployment began in August 2018 with a Proof of Concept, and ended with the signing of the Final Version acceptance certificate in October 2021. We present the challenges that were faced during the deployment, particularly in relation to the security check and installation processes in the production environment.

pdf bib abs
“Hi, how can I help you?” Improving Machine Translation of Conversational Content in a Business Context
Bianka Buschbeck | Jennifer Mell | Miriam Exel | Matthias Huck

This paper addresses the automatic translation of conversational content in a business context, for example support chat dialogues. While such use cases share characteristics with other informal machine translation scenarios, translation requirements with respect to technical and business-related expressions are high. To succeed in such scenarios, we experimented with curating dedicated training and test data, injecting noise to improve robustness, and applying sentence weighting schemes to carefully manage the influence of the different corpora. We show that our approach improves the performance of our models on conversational content for all 18 investigated language pairs while preserving translation quality on other domains - an indispensable requirement to integrate these developments into our MT engines at SAP.

pdf bib abs
Agent and User-Generated Content and its Impact on Customer Support MT
Madalena Gonçalves | Marianna Buchicchio | Craig Stewart | Helena Moniz | Alon Lavie

This paper illustrates a new evaluation framework developed at Unbabel for measuring the quality of source language text and its effect on both Machine Translation (MT) and Human Post-Edition (PE) performed by non-professional post-editors. We examine both agent and user-generated content from the Customer Support domain and propose that differentiating the two is crucial to obtaining high quality translation output. Furthermore, we present results of initial experimentation with a new evaluation typology based on the Multidimensional Quality Metrics (MQM) Framework Lommel et al., 2014), specifically tailored toward the evaluation of source language text. We show how the MQM Framework Lommel et al., 2014) can be adapted to assess errors of monolingual source texts and demonstrate how very specific source errors propagate to the MT and PE targets. Finally, we illustrate how MT systems are not robust enough to handle very specific source noise in the context of Customer Support data.

pdf bib abs
A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content
Miguel Menezes | Vera Cabarrão | Pedro Mota | Helena Moniz | Alon Lavie

This paper describes the research developed at Unbabel, a Portuguese Machine-translation start-up, that combines MT with human post-edition and focuses strictly on customer service content. We aim to contribute to furthering MT quality and good-practices by exposing the importance of having a continuously-in-development robust Named Entity Recognition system compliant with General Data Protection Regulation (GDPR). Moreover, we have tested semiautomatic strategies that support and enhance the creation of Named Entities gold standards to allow a more seamless implementation of Multilingual Named Entities Recognition Systems. The project described in this paper is the result of a shared work between Unbabel ́s linguists and Unbabel ́s AI engineering team, matured over a year. The project should, also, be taken as a statement of multidisciplinary, proving and validating the much-needed articulation between the different scientific fields that compose and characterize the area of Natural Language Processing (NLP).

pdf bib abs
Investigating automatic and manual filtering methods to produce MT-ready glossaries from existing ones
Maria Afara | Randy Scansani | Loïc Dugast

Commercial Machine Translation (MT) providers offer functionalities that allow users to leverage bilingual glossaries. This poses the question of how to turn glossaries that were intended to be used by a human translator into MT-ready ones, removing entries that could harm the MT output. We present two automatic filtering approaches - one based on rules and the second one relying on a translation memory - and a manual filtering procedure carried out by a linguist. The resulting glossaries are added to the MT model. The outputs are compared against a baseline where no glossary is used and an output produced using the original glossary. The present work aims at investigating if any of these filtering methods can bring a higher terminology accuracy without negative effects on the overall quality. Results are measured with terminology accuracy and Translation Edit Rate. We test our filters on two language pairs, En-Fr and De-En. Results show that some of the automatically filtered glossaries improve the output when compared to the baseline, and they may help reach a better balance between accuracy and overall quality, replacing the costly manual process without quality loss.

pdf bib abs
Comparing Multilingual NMT Models and Pivoting
Celia Soler Uguet | Fred Bane | Anna Zaretskaya | Tània Blanch Miró

Following recent advancements in multilingual machine translation at scale, our team carried out tests to compare the performance of multilingual models (M2M from Facebook and multilingual models from Helsinki-NLP) with a two-step translation process using English as a pivot language. Direct assessment by linguists rated translations produced by pivoting as consistently better than those obtained from multilingual models of similar size, while automated evaluation with COMET suggested relative performance was strongly impacted by domain and language family.

pdf bib abs
Pre-training Synthetic Cross-lingual Decoder for Multilingual Samples Adaptation in E-Commerce Neural Machine Translation
Kamal Kumar Gupta | Soumya Chennabasavraj | Nikesh Garera | Asif Ekbal

Availability of the user reviews in vernacular languages is helpful for the users to get information regarding the products. Since most of the e-commerce websites allow the reviews in English language only, it is important to provide the translated versions of the reviews to the non-English speaking users. Translation of the user reviews from English to vernacular languages is a challenging task, predominantly due to the lack of sufficient in-domain datasets. In this paper, we present a pre-training based efficient technique which is used to adapt and improve the single multilingual neural machine translation (NMT) model for the low-resource language pairs. The pre-trained model contains a special synthetic cross-lingual decoder. The decoder for the pre-training is trained over the cross-lingual target samples where the phrases are replaced with their translated counterparts. After pre-training, the model is adapted to multiple samples of the low-resource language pairs using incremental learning that does not require full training from the very scratch. We perform the experiments over eight low-resource and three high resource language pairs from the generic domain, and two language pairs from the product review domains. Through our synthetic multilingual decoder based pre-training, we achieve improvements of upto 4.35 BLEU points compared to the baseline and 2.13 BLEU points compared to the previous code-switched pre-trained models. The review domain outputs from the proposed model are evaluated in real time by human evaluators in the e-commerce company Flipkart.

pdf bib abs
Error Annotation in Post-Editing Machine Translation: Investigating the Impact of Text-to-Speech Technology
Justus Brockmann | Claudia Wiesinger | Dragoș Ciobanu

As post-editing of machine translation (PEMT) is becoming one of the most dominant services offered by the language services industry (LSI), efforts are being made to support provision of this service with additional technology. We present text-to-speech (T2S) as a potential attention-raising technology for post-editors. Our study was conducted with university students and included both PEMT and error annotation of a creative text with and without T2S. Focusing on the error annotation data, our analysis finds that participants under-annotated fewer MT errors in the T2S condition compared to the silent condition. At the same time, more over-annotation was recorded. Finally, annotation performance corresponds to participants’ attitudes towards using T2S.

pdf bib abs
Post-editing in Automatic Subtitling: A Subtitlers’ perspective
Alina Karakanta | Luisa Bentivogli | Mauro Cettolo | Matteo Negri | Marco Turchi

Recent developments in machine translation and speech translation are opening up opportunities for computer-assisted translation tools with extended automation functions. Subtitling tools are recently being adapted for post-editing by providing automatically generated subtitles, and featuring not only machine translation, but also automatic segmentation and synchronisation. But what do professional subtitlers think of post-editing automatically generated subtitles? In this work, we conduct a survey to collect subtitlers’ impressions and feedback on the use of automatic subtitling in their workflows. Our findings show that, despite current limitations stemming mainly from speech processing errors, automatic subtitling is seen rather positively and has potential for the future.

pdf bib abs
Working with Pre-translated Texts: Preliminary Findings from a Survey on Post-editing and Revision Practices in Swiss Corporate In-house Language Services
Sabrina Girletti

With the arrival of neural machine translation, the boundaries between revision and post-editing (PE) have started to blur (Koponen et al., 2020). To shed light on current professional practices and provide new pedagogical perspectives, we set up a survey-based study to investigate how PE and revision are carried out in professional settings. We received 86 responses from corporate translators working at 26 different corporate in-house language services in Switzerland. Although the differences between the two activities seem to be clear for in-house linguists, our findings show that they tend to use the same reading strategies when working with human-translated and machine-translated texts.

pdf bib abs
Dynamic Adaptation of Neural Machine-Translation Systems Through Translation Exemplars
Arda Tezcan

This project aims to study the impact of adapting neural machine translation (NMT) systems through translation exemplars, determine the optimal similarity metric(s) for retrieving informative exemplars, and, verify the usefulness of this approach for domain adaptation of NMT systems.

pdf bib abs
Language I/O Solution for Multilingual Customer Support
Diego Bartolome | Chris Jacob

We describe the multilingual customer solution by Language I/O in this paper. With data security and confidentiality ensured by the ISO 27001 certification, global corporations can turn monolingual customer support agents into efficient multilingual brand ambassadors in less than 24 hours. Our solution supports more than 100 languages.

pdf bib abs
Towards Readability-Controlled Machine Translation of COVID-19 Texts
Fernando Alva-Manchego | Matthew Shardlow

This project investigates the capabilities of Machine Translation models for generating translations at varying levels of readability, focusing on texts related to COVID-19. Whilst it is possible to automatically translate this information, the resulting text may contain specialised terminology, or may be written in a style that is difficult for lay readers to understand. So far, we have collected a new dataset with manual simplifications for English and Spanish sentences in the TICO-19 dataset, as well as implemented baseline pipelines combining Machine Translation and Text Simplification models.

pdf bib abs
DeBiasByUs: Raising Awareness and Creating a Database of MT Bias
Joke Daems | Janiça Hackenbuchner

This paper presents the project initiated by the BiasByUs team resulting from the 2021 Artificially Correct Hackaton. We briefly explain our winning participation in the hackaton, tackling the challenge on ‘Database and detection of gender bi-as in A.I. translations’, we highlight the importance of gender bias in Machine Translation (MT), and describe our pro-posed solution to the challenge, the cur-rent status of the project, and our envi-sioned future collaborations and re-search.

The MultitraiNMT Erasmus+ project has developed an open innovative syl-labus in machine translation, focusing on neural machine translation (NMT) and targeting both language learners and translators. The training materials include an open access coursebook with more than 250 activities and a pedagogical NMT interface called MutNMT that allows users to learn how neural machine translation works. These materials will allow students to develop the technical and ethical skills and competences required to become informed, critical users of machine translation in their own language learn-ing and translation practice. The pro-ject started in July 2019 and it will end in July 2022.

pdf bib abs
EMBEDDIA project: Cross-Lingual Embeddings for Less- Represented Languages in European News Media
Senja Pollak | Andraž Pelicon

EMBEDDIA project developed a range of resources and methods for less-resourced EU languages, focusing on applications for media industry, including keyword extraction, comment moderation and article generation.

pdf bib abs
Trados-to-Translog-II: Adding Gaze and Qualitivity data to the CRITT TPR-DB
Masaru Yamada | Takanori Mizowaki | Longhui Zou | Michael Carl

The CRITT (Center for Research and Innovation in Translation and Translation Technology) provides a Translation Process Research Database (TPR-DB) and a rich set of summary tables and tools that help to investigate translator behavior. In this paper, we describe a new tool in the TPR-DB that converts Trados Studio keylogging data (Qualitivity) into Translog-II format and adds the converted data to the CRITT TPR-DB. The tool is also able to synchronize with the output of various eye-trackers. We describe the components of the new TPR-DB tool and highlight some of the features that it produces in the TPR-DB tables.

pdf bib abs
Writing in a second Language with Machine translation (WiLMa)
Margot Fonteyne | Maribel Montero Perez | Joke Daems | Lieve Macken

The WiLMa project aims to assess the effects of using machine translation (MT) tools on the writing processes of second language (L2) learners of varying proficiency. Particular attention is given to individual variation in learners’ tool use.

Europeana Translate is a project funded under the Connecting European Facility with the objective to take advantage of state-of-the-art machine translation in order to increase the multilinguality of resources in the cultural heritage domain

pdf bib abs
The PASSAGE project : Standard German Subtitling of Swiss German TV content
Pierrette Bouillon | Johanna Gerlach | Jonathan Mutal | Marianne Starlander

We present the PASSAGE project, which aims at automatic Standard German subtitling of Swiss German TV content. This is achieved in a two step process, beginning with ASR to produce a normalised transcription, followed by translation into Standard German. We focus on the second step, for which we explore different approaches and contribute aligned corpora for future research.

We introduce the project “MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages”, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages. The approach followed consists of crawling large amounts of textual data from carefully selected top-level domains of the Internet, and then applying a curation and enrichment pipeline. In addition to corpora, the project will release successive versions of the free/open-source web crawling and curation software used.

pdf bib abs
MT-Pese: Machine Translation and Post-Editese
Sheila Castilho | Natália Resende

This paper introduces the MT-Pese project, which aims at researching the post-editese phenomena in machine translated texts. We describe a range of experiments performed in order to gauge the effect of post-editese in dif-ferent domains, backtranslation, and quality.

pdf bib abs
A Quality Estimation and Quality Evaluation Tool for the Translation Industry
Elena Murgolo | Javad Pourmostafa Roshan Sharami | Dimitar Shterionov

With the increase in machine translation (MT) quality over the latest years, it has now become a common practice to integrate MT in the workflow of language service providers (LSPs) and other actors in the translation industry. With MT having a direct impact on the translation workflow, it is important not only to use high-quality MT systems, but also to understand the quality dimension so that the humans involved in the translation workflow can make informed decisions. The evaluation and monitoring of MT output quality has become one of the essential aspects of language technology management in LSPs’ workflows. First, a general practice is to carry out human tests to evaluate MT output quality before deployment. Second, a quality estimate of the translated text, thus after deployment, can inform post editors or even represent post-editing effort. In the former case, based on the quality assessment of a candidate engine, an informed decision can be made whether the engine would be deployed for production or not. In the latter, a quality estimate of the translation output can guide the human post-editor or even make rough approximations of the post-editing effort. Quality of an MT engine can be assessed on document- or on sentence-level. A tool to jointly provide all these functionalities does not exist yet. The overall objective of the project presented in this paper is to develop an MT quality assessment (MTQA) tool that simplifies the quality assessment of MT engines, combining quality evaluation and quality estimation on document- and sentence- level.

We present the MTee project - a research initiative funded via an Estonian public procurement to develop machine translation technology that is open-source and free of charge. The MTee project delivered an open-source platform serving state-of-the-art machine translation systems supporting four domains for six language pairs translating from Estonian into English, German, and Russian and vice-versa. The platform also features grammatical error correction and speech translation for Estonian and allows for formatted document translation and automatic domain detection. The software, data and training workflows for machine translation engines are all made publicly available for further use and research.

pdf bib abs
Latest Development in the FoTran Project – Scaling Up Language Coverage in Neural Machine Translation Using Distributed Training with Language-Specific Components
Raúl Vázquez | Michele Boggia | Alessandro Raganato | Niki A. Loppi | Stig-Arne Grönroos | Jörg Tiedemann

We describe the enhancement of a multilingual NMT toolkit developed as part of the FoTran project. We devise our modular attention-bridge model, which connects language-specific components through a shared network layer. The system now supports distributed training over many nodes and GPUs in order to substantially scale up the number of languages that can be included in a modern neural translation architecture. The model enables the study of emerging language-agnostic representations and also provides a modular toolkit for efficient machine translation.

pdf bib abs
InDeep × NMT: Empowering Human Translators via Interpretable Neural Machine Translation
Gabriele Sarti | Arianna Bisazza

Neural machine translation (NMT) systems are nowadays essential components of professional translation workflows. Consequently, human translators are increasingly working as post-editors for machine-translated content. The NWO-funded InDeep project aims to empower users of Deep Learning models of text, speech, and music by improving their ability to interact with such models and interpret their behaviors. In the specific context of translation, we aim at developing new tools and methodologies to improve prediction attribution, error analysis, and controllable generation for NMT systems. These advances will be evaluated through field studies involving professional translators to assess gains in efficiency and overall enjoyability of the post-editing process.

pdf bib abs
QUARTZ: Quality-Aware Machine Translation
José G.C. de Souza | Ricardo Rei | Ana C. Farinha | Helena Moniz | André F. T. Martins

This paper presents QUARTZ, QUality-AwaRe machine Translation, a project led by Unbabel which aims at developing machine translation systems that are more robust and produce fewer critical errors. With QUARTZ we want to enable machine translation for user-generated conversational content types that do not tolerate critical errors in automatic translations.

pdf bib abs
POLENG MT: An Adaptive MT Platform
Artur Nowakowski | Krzysztof Jassem | Maciej Lison | Kamil Guttmann | Mikołaj Pokrywka

We introduce POLENG MT, an MT platform that may be used as a cloud web application or as an on-site solution. The platform is capable of providing accurate document translation, including the transfer of document formatting between the input document and the output document. The main feature of the on-site version is dedicated customer adaptation, which consists of training on specialized texts and applying forced terminology translation according to the user’s needs.

pdf bib abs
plain X - AI Supported Multilingual Video Workflow Platform
Carlos Amaral | Peggy van der Kreeft

The plain X platform is a toolbox for multilingual adaptation, for video, audio, and text content. The software is a 4-in-1 tool, combining several steps in the adaptation process, i.e., transcription, translation, subtitling, and voice-over, all automatically generated, but with a high level of editorial control. Users can choose which translation engine is used (e.g., MS Azure, Google, DeepL) depending on best performance. As a result, plain X enables a smooth semi-automated production of subtitles or voice-over, much faster than with older, manual workflows. The software was developed out of EU research projects and has recently been rolled out for professional use. It brings Artificial Intelligence (AI) into the multilingual media production process, while keeping the human in the loop.

pdf bib abs
DELA Project: Document-level Machine Translation Evaluation
Sheila Castilho

This paper presents the results of the DELA Project. We describe the testing of context span for document-level evaluation, construction of a document-level corpus, and context position, as well as the latest developments of the project when looking at human and automatic evaluation metrics for document-level evaluation.

pdf bib abs
Background Search for Terminology in STAR MT Translate
Giorgio Bernardinello | Judith Klein

When interested in an internal web ap-plication for MT, corporate customers always ask how reliable terminology will be in their translations. Coherent vocabulary is crucial in many aspects of corporate translations, such as doc-umentation or marketing. The main goal every MT provider would like to achieve is to fully integrate the cus-tomer’s terminology into the model, so that the result does not need to be edit-ed, but this is still not always guaran-teed. Besides, a web application like STAR MT Translate allows our cus-tomers to use – integrated within the same page – different generic MT pro-viders which were not trained with customer-specific data. So, as a prag-matic approach, we decided to in-crease the level of integration between WebTerm and STAR MT Translate, adding to the latter more terminological information, with which the user can post-edit the translation if needed.

The SignON project (www.signon-project.eu) focuses on the research and development of a Sign Language (SL) translation mobile application and an open communications framework. SignON rectifies the lack of technology and services for the automatic translation between signed and spoken languages, through an inclusive, humancentric solution which facilitates communication between deaf, hard of hearing (DHH) and hearing individuals. We present an overview of the current status of the project, describing the milestones reached to date and the approaches that are being developed to address the challenges and peculiarities of Sign Language Machine Translation (SLMT).

DeepSPIN is a research project funded by the European Research Council (ERC) whose goal is to develop new neural structured prediction methods, models, and algorithms for improving the quality, interpretability, and data-efficiency of natural language processing (NLP) systems, with special emphasis on machine translation and quality estimation. We describe in this paper the latest findings from this project.

This paper is about a multilingual chatbot developed for public administration within the CEF funded project ENRICH4ALL. We argue for multi-lingual chatbots empowered through MT and discuss the integration of the CEF eTranslation service in a chatbot solution.

pdf bib abs
MTrill: Machine Translation Impact on Language Learning
Natalia Resende

This paper presents the MTrill project which aimed at investigating the impact of popular web-based machine transla-tion (MT) tools on the cognitive pro-cessing of English as a second language. The methodological approach and main results are presented.

pdf bib abs
Connecting client infrastructure with Yamagata Europe machine translation using JSON-based data exchange
Jourik Ciesielski | Heidi Van Hiel

This document describes how Yamagata Europe enables organizations to connect seamlessly to its MT and TMS infrastructure using a JSON-based data exchange mechanism.

pdf bib abs
Towards a methodology for evaluating automatic subtitling
Alina Karakanta | Luisa Bentivogli | Mauro Cettolo | Matteo Negri | Marco Turchi

In response to the increasing interest towards automatic subtitling, this EAMT-funded project aimed at collecting subtitle post-editing data in a real use case scenario where professional subtitlers edit automatically generated subtitles. The post-editing setting includes, for the first time, automatic generation of timestamps and segmentation, and focuses on the effect of timing and segmentation edits on the post-editing process. The collected data will serve as the basis for investigating how subtitlers interact with automatic subtitling and for devising evaluation methods geared to the multimodal nature and formal requirements of subtitling.

pdf bib abs
DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations
Ekaterina Lapshinova-Koltunski | Maja Popović | Maarit Koponen

This project aimed to design a corpus of parallel human translations (HTs) of the same source texts by professionals and students. The resulting corpus consists of English news and reviews source texts, their translations into Russian and Croatian, and translations of the reviews into Finnish. The corpus will be valuable for both studying variation in translation and evaluating machine translation (MT) systems.

pdf bib abs
GoURMET – Machine Translation for Low-Resourced Languages
Peggy van der Kreeft | Alexandra Birch | Sevi Sariisik | Felipe Sánchez-Martínez | Wilker Aziz

The GoURMET project, funded by the European Commission’s H2020 program (under grant agreement 825299), develops models for machine translation, in particular for low-resourced languages. Data, models and software releases as well as the GoURMET Translate Tool are made available as open source.

The work in progress on the CEF Action CURLICA T is presented. The general aim of the Action is to compile curated datasets in seven languages of the con- sortium in domains of relevance to Euro- pean Digital Service Infrastructures (DSIs) in order to enhance the eTransla- tion services.

The CEFAT4Cities project aims at creating a multilingual semantic interoperability layer for Smart Cities that allows users from all EU member States to interact with public services in their own language. The CEFAT4Cities processing pipeline transforms natural-language administrative procedures into machine-readable data using various multilingual Natural Language Processing techniques, such as semantic networks and machine translation, thus allowing for the development of more sophisticated and more user-friendly public services applications.

The work in progress on the CEF Action National Language Technology Platform (NLTP) is presented. The Action aims at combining the most advanced Language Technology (LT) tools and solutions in a new state-of-the-art, Artificial Intelli- gence (AI) driven, National Language Technology Platform (NLTP).

This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation. This “meta-paper” will serve as reference for citations of the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.

This paper provides an overview of the main achievements of the completed PRINCIPLE project, a 2-year action funded by the European Commission under the Connecting Europe Facility (CEF) programme. PRINCIPLE focused on collecting high-quality language resources for Croatian, Icelandic, Irish and Norwegian, which are severely low-resource languages, especially for building effective machine translation (MT) systems. We report the achievements of the project, primarily, in terms of the large amounts of data collected for all four low-resource languages and of promoting the uptake of neural MT (NMT) for these languages.

Video dubbing is the activity of revoicing a video while offering a viewing experience equivalent to the original video. The revoicing usually comes with a changed script, mostly in a different language, and the revoicing should reproduce the original emotions, coherent with the body language, and lip synchronized. In this project, we aim to build an AD system in three phases: (1) voice-over; (2) emotional voice-over; (3) full dubbing, while enhancing the system with human-in-the-loop capabilities for a higher quality.

This paper provides an overview of the ongoing European Language Equality(ELE) project, an 18-month action funded by the European Commission which involves 52 partners. The primary goal of ELE is to prepare the European Language Equality Programme, in the form of a strategic research, innovation and implementation agenda and a roadmap for achieving full digital language equality (DLE) in Europe by 2030.

pdf bib abs
LITHME: Language in the Human-Machine Era
Maarit Koponen | Kais Allkivi-Metsoja | Antonio Pareja-Lora | Dave Sayers | Márta Seresi

The LITHME COST Action brings together researchers from various fields of study focusing on language and technology. We present the overall goals of LITHME and the network’s working groups focusing on diverse questions related to language and technology. As an example of the work of the LITHME network, we discuss the working group on language work and language professionals.

pdf bib abs
CREAMT: Creativity and narrative engagement of literary texts translated by translators and NMT
Ana Guerberof Arenas | Antonio Toral

We present here the EU-funded project CREAMT that seeks to understand what is meant by creativity in different translation modalities, e.g. machine translation, post-editing or professional translation. Focusing on the textual elements that determine creativity in translated literary texts and the reader experience, CREAMT uses a novel, interdisciplinary approach to assess how effective MT is in literary translation considering creativity in translation and the ultimate user: the reader.

pdf bib abs
Developing Machine Translation Engines for Multilingual Participatory Spaces
Pintu Lohar | Guodong Xie | Andy Way

It is often a challenging task to build Machine Translation (MT) engines for a specific domain due to the lack of parallel data in that area. In this project, we develop a range of MT systems for 6 European languages (English, German, Italian, French, Polish and Irish) in all directions and in two domains (environment and economics).

This project aimed at extending the test sets of the MuST-C speech translation (ST) corpus with new reference translations. The new references were collected from professional post-editors working on the output of different ST systems for three language pairs: English-German/Italian/Spanish. In this paper, we shortly describe how the data were collected and how they are distributed. As an evidence of their usefulness, we also summarise the findings of the first comparative evaluation of cascade and direct ST approaches, which was carried out relying on the collected data. The project was partially funded by the European Association for Machine Translation (EAMT) through its 2020 Sponsorship of Activities programme.

pdf bib abs
Monitio - Large Scale MT for Multilingual Media Monitoring
Carlos Amaral | Sebastião Miranda

Monitio is a real-time crosslingual global media monitoring platform which delivers actionable insights beyond human scale and capabilities. Our system continuously ingests a massive number of multilingual data sources that are automatically translated, filtered and categorized to generate intelligence reports specially geared towards media monitoring professionals’ needs.