Lieve Macken

2023

pdf bib abs
Adapting Machine Translation Education to the Neural Era: A Case Study of MT Quality Assessment
Lieve Macken | Bram Vanroy | Arda Tezcan
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

The use of automatic evaluation metrics to assess Machine Translation (MT) quality is well established in the translation industry. Whereas it is relatively easy to cover the word- and character-based metrics in an MT course, it is less obvious to integrate the newer neural metrics. In this paper we discuss how we introduced the topic of MT quality assessment in a course for translation students. We selected three English source texts, each having a different difficulty level and style, and let the students translate the texts into their L1 and reflect upon translation difficulty. Afterwards, the students were asked to assess MT quality for the same texts using different methods and to critically reflect upon obtained results. The students had access to the MATEO web interface, which contains word- and character-based metrics as well as neural metrics. The students used two different reference translations: their own translations and professional translations of the three texts. We not only synthesise the comments of the students, but also present the results of some cross-lingual analyses on nine different language pairs.

pdf bib abs
Developing User-centred Approaches to Technological Innovation in Literary Translation (DUAL-T)
Paola Ruffo | Joke Daems | Lieve Macken
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

DUAL-T is an EU-funded project which aims at involving literary translators in the testing of technology-inclusive workflows. Participants will be asked to translate three short stories using, respectively, (1) a text editor combined with online resources, (2) a Computer-Aided Translation (CAT) tool, and (3) a Machine Translation Post-editing (MTPE) tool.

pdf bib abs
MATEO: MAchine Translation Evaluation Online
Bram Vanroy | Arda Tezcan | Lieve Macken
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

We present MAchine Translation Evaluation Online (MATEO), a project that aims to facilitate machine translation (MT) evaluation by means of an easy-to-use interface that can evaluate given machine translations with a battery of automatic metrics. It caters to both experienced and novice users who are working with MT, such as MT system builders, teachers and students of (machine) translation, and researchers.

2022

pdf bib abs
Literary translation as a three-stage process: machine translation, post-editing and revision
Lieve Macken | Bram Vanroy | Luca Desmet | Arda Tezcan
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This study focuses on English-Dutch literary translations that were created in a professional environment using an MT-enhanced workflow consisting of a three-stage process of automatic translation followed by post-editing and (mainly) monolingual revision. We compare the three successive versions of the target texts. We used different automatic metrics to measure the (dis)similarity between the consecutive versions and analyzed the linguistic characteristics of the three translation variants. Additionally, on a subset of 200 segments, we manually annotated all errors in the machine translation output and classified the different editing actions that were carried out. The results show that more editing occurred during revision than during post-editing and that the types of editing actions were different.

pdf bib abs
Writing in a second Language with Machine translation (WiLMa)
Margot Fonteyne | Maribel Montero Perez | Joke Daems | Lieve Macken
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The WiLMa project aims to assess the effects of using machine translation (MT) tools on the writing processes of second language (L2) learners of varying proficiency. Particular attention is given to individual variation in learners’ tool use.

pdf bib abs
GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation
Toon Colman | Margot Fonteyne | Joke Daems | Nicolas Dirix | Lieve Macken
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In the present paper, we describe a large corpus of eye movement data, collected during natural reading of a human translation and a machine translation of a full novel. This data set, called GECO-MT (Ghent Eye tracking Corpus of Machine Translation) expands upon an earlier corpus called GECO (Ghent Eye-tracking Corpus) by Cop et al. (2017). The eye movement data in GECO-MT will be used in future research to investigate the effect of machine translation on the reading process and the effects of various error types on reading. In this article, we describe in detail the materials and data collection procedure of GECO-MT. Extensive information on the language proficiency of our participants is given, as well as a comparison with the participants of the original GECO. We investigate the distribution of a selection of important eye movement variables and explore the possibilities for future analyses of the data. GECO-MT is freely available at https://www.lt3.ugent.be/resources/geco-mt.

pdf bib abs
LeConTra: A Learner Corpus of English-to-Dutch News Translation
Bram Vanroy | Lieve Macken
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master’s programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.

2020

pdf bib abs
Literary Machine Translation under the Magnifying Glass: Assessing the Quality of an NMT-Translated Detective Novel on Document Level
Margot Fonteyne | Arda Tezcan | Lieve Macken
Proceedings of the Twelfth Language Resources and Evaluation Conference

Several studies (covering many language pairs and translation tasks) have demonstrated that translation quality has improved enormously since the emergence of neural machine translation systems. This raises the question whether such systems are able to produce high-quality translations for more creative text types such as literature and whether they are able to generate coherent translations on document level. Our study aimed to investigate these two questions by carrying out a document-level evaluation of the raw NMT output of an entire novel. We translated Agatha Christie’s novel The Mysterious Affair at Styles with Google’s NMT system from English into Dutch and annotated it in two steps: first all fluency errors, then all accuracy errors. We report on the overall quality, determine the remaining issues, compare the most frequent error types to those in general-domain MT, and investigate whether any accuracy and fluency errors co-occur regularly. Additionally, we assess the inter-annotator agreement on the first chapter of the novel.

pdf bib abs
Assessing the Comprehensibility of Automatic Translations (ArisToCAT)
Lieve Macken | Margot Fonteyne | Arda Tezcan | Joke Daems
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The ArisToCAT project aims to assess the comprehensibility of ‘raw’ (unedited) MT output for readers who can only rely on the MT output. In this project description, we summarize the main results of the project and present future work.

2019

pdf bib
Modelling word translation entropy and syntactic equivalence with machine learning
Bram Vanroy | Orphée De Clercq | Lieve Macken
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production

pdf bib
When a ‘sport’ is a person and other issues for NMT of novels
Arda Tezcan | Joke Daems | Lieve Macken
Proceedings of the Qualities of Literary Machine Translation

2018

pdf bib
A fine-grained error analysis of NMT, SMT and RBMT output for English-to-Dutch
Laura Van Brussel | Arda Tezcan | Lieve Macken
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

We present the highlights of the now finished 4-year SCATE project. It was completed in February 2018 and funded by the Flemish Government IWT-SBO, project No. 130041.1

In order to improve the symbiosis between machine translation (MT) system and post-editor, it is not enough to know that the output of one system is better than the output of another system. A fine-grained error analysis is needed to provide information on the type and location of errors occurring in MT and the corresponding errors occurring after post-editing (PE). This article reports on a fine-grained translation quality assessment approach which was applied to machine translated-texts and the post-edited versions of these texts, made by student post-editors. By linking each error to the corresponding source text-passage, it is possible to identify passages that were problematic in MT, but not after PE, or passages that were problematic even after PE. This method provides rich data on the origin and impact of errors, which can be used to improve post-editor training as well as machine translation systems. We present the results of a pilot experiment on the post-editing of newspaper articles and highlight the advantages of our approach.

2013

pdf bib
Quality as the sum of its parts: a two-step approach for the identification of translation problems and translation quality assessment for HT and MT+PE
Joke Daems | Lieve Macken | Sonia Vandepitte
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

2012

pdf bib abs
From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information
Lieve Macken | Veronique Hoste | Mariëlle Leijten | Luuk Van Waes
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Keystroke logging tools are a valuable aid to monitor written language production. These tools record all keystrokes, including backspaces and deletions together with timing information. In this paper we report on an extension to the keystroke logging program Inputlog in which we aggregate the logged process data from the keystroke (character) level to the word level. The logged process data are further enriched with different kinds of linguistic information: part-of-speech tags, lemmata, chunk boundaries, syllable boundaries and word frequency. A dedicated parser has been developed that distils from the logged process data word-level revisions, deleted fragments and final product data. The linguistically-annotated output will facilitate the linguistic analysis of the logged data and will provide a valuable basis for more linguistically-oriented writing process research. The set-up of the extension to Inputlog is largely language-independent. As proof-of-concept, the extension has been developed for English and Dutch. Inputlog is freely available for research purposes.

pdf bib
From Character to Word Level: Enabling the Linguistic Analyses of Inputlog Process Data
Mariëlle Leijten | Lieve Macken | Veronique Hoste | Eric Van Horenbeeck | Luuk Van Waes
Proceedings of the Second Workshop on Computational Linguistics and Writing (CL&W 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering

2010

pdf bib abs
An Annotation Scheme and Gold Standard for Dutch-English Word Alignment
Lieve Macken
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The importance of sentence-aligned parallel corpora has been widely acknowledged. Reference corpora in which sub-sentential translational correspondences are indicated manually are more labour-intensive to create, and hence less wide-spread. Such manually created reference alignments -- also called Gold Standards -- have been used in research projects to develop or test automatic word alignment systems. In most translations, translational correspondences are rather complex; for example word-by-word correspondences can be found only for a limited number of words. A reference corpus in which those complex translational correspondences are aligned manually is therefore also a useful resource for the development of translation tools and for translation studies. In this paper, we describe how we created a Gold Standard for the Dutch-English language pair. We present the annotation scheme, annotation guidelines, annotation tool and inter-annotator results. To cover a wide range of syntactic and stylistic phenomena that emerge from different writing and translation styles, our Gold Standard data set contains texts from different text types. The Gold Standard will be publicly available as part of the Dutch Parallel Corpus.

2009

pdf bib
Language-Independent Bilingual Terminology Extraction from a Multilingual Parallel Corpus
Els Lefever | Lieve Macken | Veronique Hoste
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Linguistically-Based Sub-Sentential Alignment for Terminology Extraction from a Bilingual Automotive Corpus
Lieve Macken | Els Lefever | Veronique Hoste
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib abs
Sentence Alignment in DPC: Maximizing Precision, Minimizing Human Effort
Julia Trushkina | Lieve Macken | Hans Paulussen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

A wide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus where all sentences are aligned at very high precision with a minimal human effort involved. The experiments on a combination of sentence aligners with different underlying algorithms described in this paper showed that by verifying only those links which were not recognized by at least two aligners, an error rate can be reduced by 93.76% as compared to the performance of the best aligner. Such manual involvement concerned only a small portion of all data (6%). This significantly reduces a load of manual work necessary to achieve nearly 100% accuracy of alignment.