Arda Tezcan

2025

We present key interim findings from the ongoing MaTIAS project, which focuses on developing a multilingual notification system for asylum reception centres in Belgium. This system integrates machine translation (MT) to enable staff to provide practical information to residents in their native language, thus fostering more effective communication. Our discussion focuses on three key aspects: the development of the multilingual messaging platform, the types of messages the system is designed to handle, and the evaluation of potential MT systems for integration.

pdf bib abs

Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation
Thomas Moerman | Tom Vanallemeersch | Sara Szoc | Arda Tezcan
Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025)

To enhance the accessibility of scientific literature in multiple languages and facilitate the exchange of information among scholars and a wider audience, there is a need for high-performing specialized machine translation (MT) engines. However, this requires efficient filtering and the use of domain-specific data. In this study, we investigate whether approaches for increasing training data using topic filtering and more efficient use of such data through exploiting fuzzy matches (i.e. similar translations to a given input; FMs) improve translation quality. We apply these techniques both to sequence-to-sequence MT models and off-the-shelf multilingual large language models (LLMs) in three scientific disciplines. Our results suggest that the combination of topic filtering and FM augmentation is an effective strategy for training neural machine translation (NMT) models from scratch, not only surpassing baseline NMT models but also delivering improved translation performance compared to smaller LLMs in terms of the number of parameters. Furthermore, we find that although FM augmentation through in-context learning generally improves LLM translation performance, limited domain-specific datasets can yield results comparable to those achieved with additional multi-domain datasets.

2024

pdf bib abs

This project aims to develop a multilingual notification system for asylum reception centres in Belgium using machine translation. The system will allow staff to communicate practical messages to residents in their own language. Ethnographically inspired fieldwork is being conducted in reception centres to understand current communication practices and ensure that the technology meets user needs. The quality and suitability of machine translation will be evaluated for three MT systems supporting all target languages. Automatic and manual evaluation methods will be used to assess translation quality, and terms of use, privacy and data protection conditions will be analysed.

pdf bib abs

Automatic detection of (potential) factors in the source text leading to gender bias in machine translation
Janiça Hackenbuchner | Arda Tezcan | Joke Daems
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)

This research project aims to develop a comprehensive methodology to help make machine translation (MT) systems more gender-inclusive for society. The goal is the creation of a detection system, a machine learning (ML) model trained on manual annotations, that can automatically analyse source data and detect and highlight words and phrases that influence the gender bias inflection in target translations.The main research outputs will be (1) a manually annotated dataset, (2) a taxonomy, and (3) a fine-tuned model.

pdf bib abs

You Shall Know a Word’s Gender by the Company it Keeps: Comparing the Role of Context in Human Gender Assumptions with MT
Janiça Hackenbuchner | Joke Daems | Arda Tezcan | Aaron Maladry
Proceedings of the 2nd International Workshop on Gender-Inclusive Translation Technologies

In this paper, we analyse to what extent machine translation (MT) systems and humans base their gender translations and associations on role names and on stereotypicality in the absence of (generic) grammatical gender cues in language. We compare an MT system’s choice of gender for a certain word when translating from a notional gender language, English, into a grammatical gender language, German, with thegender associations of humans. We outline a comparative case study of gender translation and annotation of words in isolation, out-of-context, and words in sentence contexts. The analysis reveals patterns of gender (bias) by MT and gender associations by humans for certain (1) out-of-context words and (2) words in-context. Our findings reveal the impact of context on gender choice and translation and show that word-level analyses fall short in such studies.

pdf bib

Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation
Arda Tezcan | Víctor M. Sánchez-Cartagena | Miquel Esplà-Gomis
Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation

pdf bib

Leveraging Synthetic Monolingual Data for Fuzzy-Match Augmentation in Neural Machine Translation: A Preliminary Study
Thomas Moerman | Arda Tezcan
Proceedings of the First International Workshop on Knowledge-Enhanced Machine Translation

2023

pdf bib abs

Adapting Machine Translation Education to the Neural Era: A Case Study of MT Quality Assessment
Lieve Macken | Bram Vanroy | Arda Tezcan
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

The use of automatic evaluation metrics to assess Machine Translation (MT) quality is well established in the translation industry. Whereas it is relatively easy to cover the word- and character-based metrics in an MT course, it is less obvious to integrate the newer neural metrics. In this paper we discuss how we introduced the topic of MT quality assessment in a course for translation students. We selected three English source texts, each having a different difficulty level and style, and let the students translate the texts into their L1 and reflect upon translation difficulty. Afterwards, the students were asked to assess MT quality for the same texts using different methods and to critically reflect upon obtained results. The students had access to the MATEO web interface, which contains word- and character-based metrics as well as neural metrics. The students used two different reference translations: their own translations and professional translations of the three texts. We not only synthesise the comments of the students, but also present the results of some cross-lingual analyses on nine different language pairs.

pdf bib abs

MATEO: MAchine Translation Evaluation Online
Bram Vanroy | Arda Tezcan | Lieve Macken
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

We present MAchine Translation Evaluation Online (MATEO), a project that aims to facilitate machine translation (MT) evaluation by means of an easy-to-use interface that can evaluate given machine translations with a battery of automatic metrics. It caters to both experienced and novice users who are working with MT, such as MT system builders, teachers and students of (machine) translation, and researchers.

2022

pdf bib abs

Literary translation as a three-stage process: machine translation, post-editing and revision
Lieve Macken | Bram Vanroy | Luca Desmet | Arda Tezcan
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This study focuses on English-Dutch literary translations that were created in a professional environment using an MT-enhanced workflow consisting of a three-stage process of automatic translation followed by post-editing and (mainly) monolingual revision. We compare the three successive versions of the target texts. We used different automatic metrics to measure the (dis)similarity between the consecutive versions and analyzed the linguistic characteristics of the three translation variants. Additionally, on a subset of 200 segments, we manually annotated all errors in the machine translation output and classified the different editing actions that were carried out. The results show that more editing occurred during revision than during post-editing and that the types of editing actions were different.

pdf bib abs

Dynamic Adaptation of Neural Machine-Translation Systems Through Translation Exemplars
Arda Tezcan
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This project aims to study the impact of adapting neural machine translation (NMT) systems through translation exemplars, determine the optimal similarity metric(s) for retrieving informative exemplars, and, verify the usefulness of this approach for domain adaptation of NMT systems.

2020

pdf bib abs

Assessing the Comprehensibility of Automatic Translations (ArisToCAT)
Lieve Macken | Margot Fonteyne | Arda Tezcan | Joke Daems
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The ArisToCAT project aims to assess the comprehensibility of ‘raw’ (unedited) MT output for readers who can only rely on the MT output. In this project description, we summarize the main results of the project and present future work.

pdf bib abs

Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level
Margot Fonteyne | Arda Tezcan | Lieve Macken
Proceedings of the Twelfth Language Resources and Evaluation Conference

Several studies (covering many language pairs and translation tasks) have demonstrated that translation quality has improved enormously since the emergence of neural machine translation systems. This raises the question whether such systems are able to produce high-quality translations for more creative text types such as literature and whether they are able to generate coherent translations on document level. Our study aimed to investigate these two questions by carrying out a document-level evaluation of the raw NMT output of an entire novel. We translated Agatha Christie’s novel The Mysterious Affair at Styles with Google’s NMT system from English into Dutch and annotated it in two steps: first all fluency errors, then all accuracy errors. We report on the overall quality, determine the remaining issues, compare the most frequent error types to those in general-domain MT, and investigate whether any accuracy and fluency errors co-occur regularly. Additionally, we assess the inter-annotator agreement on the first chapter of the novel.

2019

pdf bib abs

Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation
Bram Bulte | Arda Tezcan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present a simple yet powerful data augmentation method for boosting Neural Machine Translation (NMT) performance by leveraging information retrieved from a Translation Memory (TM). We propose and test two methods for augmenting NMT training data with fuzzy TM matches. Tests on the DGT-TM data set for two language pairs show consistent and substantial improvements over a range of baseline systems. The results suggest that this method is promising for any translation environment in which a sizeable TM is available and a certain amount of repetition across translations is to be expected, especially considering its ease of implementation.

pdf bib

When a ‘sport’ is a person and other issues for NMT of novels
Arda Tezcan | Joke Daems | Lieve Macken
Proceedings of the Qualities of Literary Machine Translation

2018

We present the highlights of the now finished 4-year SCATE project. It was completed in February 2018 and funded by the Flemish Government IWT-SBO, project No. 130041.1

pdf bib

A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch
Laura Van Brussel | Arda Tezcan | Lieve Macken
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)