Heike Przybyl


2024

pdf bib
Mitigating Translationese with GPT-4: Strategies and Performance
Maria Kunilovskaya | Koel Dutta Chowdhury | Heike Przybyl | Cristina España-Bonet | Josef Genabith
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

Translations differ in systematic ways from texts originally authored in the same language.These differences, collectively known as translationese, can pose challenges in cross-lingual natural language processing: models trained or tested on translated input might struggle when presented with non-translated language. Translationese mitigation can alleviate this problem. This study investigates the generative capacities of GPT-4 to reduce translationese in human-translated texts. The task is framed as a rewriting process aimed at modified translations indistinguishable from the original text in the target language. Our focus is on prompt engineering that tests the utility of linguistic knowledge as part of the instruction for GPT-4. Through a series of prompt design experiments, we show that GPT4-generated revisions are more similar to originals in the target language when the prompts incorporate specific linguistic instructions instead of relying solely on the model’s internal knowledge. Furthermore, we release the segment-aligned bidirectional German-English data built from the Europarl corpus that underpins this study.

2023

pdf bib
Simultaneous Interpreting as a Noisy Channel: How Much Information Gets Through
Maria Kunilovskaya | Heike Przybyl | Ekaterina Lapshinova-Koltunski | Elke Teich
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

We explore the relationship between information density/surprisal of source and target texts in translation and interpreting in the language pair English-German, looking at the specific properties of translation (“translationese”). Our data comes from two bidirectional English-German subcorpora representing written and spoken mediation modes collected from European Parliament proceedings. Within each language, we (a) compare original speeches to their translated or interpreted counterparts, and (b) explore the association between segment-aligned sources and targets in each translation direction. As additional variables, we consider source delivery mode (read-out, impromptu) and source speech rate in interpreting. We use language modelling to measure the information rendered by words in a segment and to characterise the cross-lingual transfer of information under various conditions. Our approach is based on statistical analyses of surprisal values, extracted from n-gram models of our dataset. The analysis reveals that while there is a considerable positive correlation between the average surprisal of source and target segments in both modes, information output in interpreting is lower than in translation, given the same amount of input. Significantly lower information density in spoken mediated production compared to non-mediated speech in the same language can indicate a possible simplification effect in interpreting.

2022

pdf bib
EPIC UdS - Creation and Applications of a Simultaneous Interpreting Corpus
Heike Przybyl | Ekaterina Lapshinova-Koltunski | Katrin Menzel | Stefan Fischer | Elke Teich
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.

2021

pdf bib
Found in translation/interpreting: combining data-driven and supervised methods to analyse cross-linguistically mediated communication
Ekaterina Lapshinova-Koltunski | Yuri Bizzoni | Heike Przybyl | Elke Teich
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age

pdf bib
Tracing variation in discourse connectives in translation and interpreting through neural semantic spaces
Ekaterina Lapshinova-Koltunski | Heike Przybyl | Yuri Bizzoni
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

In the present paper, we explore lexical contexts of discourse markers in translation and interpreting on the basis of word embeddings. Our special interest is on contextual variation of the same discourse markers in (written) translation vs. (simultaneous) interpreting. To explore this variation at the lexical level, we use a data-driven approach: we compare bilingual neural word embeddings trained on source-to-translation and source-to-interpreting aligned corpora. Our results show more variation of semantically related items in translation spaces vs. interpreting ones and a more consistent use of fewer connectives in interpreting. We also observe different trends with regard to the discourse relation types.