Bianka Buschbeck

Also published as: Bianka Buschbeck-Wolf, B. Buschbeck

2025

Structured Document Translation via Format Reinforcement Learning
Haiyue Song | Johannes Eschbach-Dymanus | Hour Kaing | Sumire Honda | Hideki Tanaka | Bianka Buschbeck | Masao Utiyama
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, we propose Format Reinforcement Learning (FormatRL), which employs Group Relative Policy Optimization on top of a supervised fine-tuning model to directly optimize novel structure-aware rewards: 1) TreeSim, which measures structural similarity between predicted and reference XML trees and 2) Node-chrF, which measures translation quality at the level of XML nodes. Additionally, we propose StrucAUC, a fine-grained metric distinguishing between minor errors and major structural failures. Experiments on the SAP software-documentation benchmark demonstrate improvements across six metrics and an analysis further shows how different reward functions contribute to improvements in both structural and translation quality.

2024

pdf bib abs

How Effective is Synthetic Data and Instruction Fine-tuning for Translation with Markup using LLMs?
Raj Dabre | Haiyue Song | Miriam Exel | Bianka Buschbeck | Johannes Eschbach-Dymanus | Hideki Tanaka
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Recent works have shown that prompting large language models (LLMs) is effective for translation with markup where LLMs can simultaneously transfer markup tags while ensuring that the content, both inside and outside tag pairs is correctly translated. However, these works make a rather unrealistic assumption of the existence of high-quality parallel sentences with markup for prompting. Furthermore, the impact of instruction fine-tuning (IFT) in this setting is unknown. In this paper, we provide a study, the first of its kind, focusing on the effectiveness of synthetically created markup data and IFT for translation with markup using LLMs. We focus on translation from English to five European languages, German, French, Dutch, Finnish and Russian, where we show that regardless of few-shot prompting or IFT, synthetic data created via word alignments, while leading to inferior markup transfer compared to using original data with markups, does not negatively impact the translation quality. Furthermore, IFT mainly impacts the translation quality compared to few-shot prompting and has slightly better markup transfer capabilities than the latter. We hope our work will help practitioners make effective decisions on modeling choices for LLM based translation with markup.

pdf bib abs

Exploring the Effectiveness of LLM Domain Adaptation for Business IT Machine Translation
Johannes Eschbach-Dymanus | Frank Essenberger | Bianka Buschbeck | Miriam Exel
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

In this paper, we study the translation abilities of Large Language Models (LLMs) for business IT texts.We are strongly interested in domain adaptation of translation systems, which is essential for accurate and lexically appropriate translation of such texts.Among the open-source models evaluated in a zero- and few-shot setting, we find Llama-2 13B the most promising for domain-specific translation fine-tuning.We investigate the full range of adaptation techniques for LLMs: from prompting, over parameter-efficient fine-tuning to full fine-tuning, and compare to classic neural machine translation (MT) models trained internally at SAP.We provide guidance how to use training budget most effectively for different fine-tuning approaches.We observe that while LLMs can translate on-par with SAP’s MT models on general domain data, it is difficult to close the gap on SAP’s domain-specific data, even with extensive training and carefully curated data.

2023

pdf bib abs

A Study on the Effectiveness of Large Language Models for Translation with Markup
Raj Dabre | Bianka Buschbeck | Miriam Exel | Hideki Tanaka
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

In this paper we evaluate the utility of large language models (LLMs) for translation of text with markup in which the most important and challenging aspect is to correctly transfer markup tags while ensuring that the content, both, inside and outside tags is correctly translated. While LLMs have been shown to be effective for plain text translation, their effectiveness for structured document translation is not well understood. To this end, we experiment with BLOOM and BLOOMZ, which are open-source multilingual LLMs, using zero, one and few-shot prompting, and compare with a domain-specific in-house NMT system using a detag-and-project approach for markup tags. We observe that LLMs with in-context learning exhibit poorer translation quality compared to the domain-specific NMT system, however, they are effective in transferring markup tags, especially the large BLOOM model (176 billion parameters). This is further confirmed by our human evaluation which also reveals the types of errors of the different tag transfer techniques. While LLM-based approaches come with the risk of losing, hallucinating and corrupting tags, they excel at placing them correctly in the translation.

2022

pdf bib abs

Translation of structured content is an important application of machine translation, but the scarcity of evaluation data sets, especially for Asian languages, limits progress. In this paper we present a novel multilingual multiway evaluation data set for the translation of structured documents of the Asian languages Japanese, Korean and Chinese. We describe the data set, its creation process and important characteristics, followed by establishing and evaluating baselines using the direct translation as well as detag-project approaches. Our data set is well suited for multilingual evaluation, and it contains richer annotation tag sets than existing data sets. Our results show that massively multilingual translation models like M2M-100 and mBART-50 perform surprisingly well despite not being explicitly trained to handle structured content. The data set described in this paper and used in our experiments is released publicly.

pdf bib abs

“Hi, how can I help you?” Improving Machine Translation of Conversational Content in a Business Context
Bianka Buschbeck | Jennifer Mell | Miriam Exel | Matthias Huck
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This paper addresses the automatic translation of conversational content in a business context, for example support chat dialogues. While such use cases share characteristics with other informal machine translation scenarios, translation requirements with respect to technical and business-related expressions are high. To succeed in such scenarios, we experimented with curating dedicated training and test data, injecting noise to improve robustness, and applying sentence weighting schemes to carefully manage the influence of the different corpora. We show that our approach improves the performance of our models on conversational content for all 18 investigated language pairs while preserving translation quality on other domains - an indispensable requirement to integrate these developments into our MT engines at SAP.

2020

pdf bib abs

A Parallel Evaluation Data Set of Software Documentation with Document Structure Annotation
Bianka Buschbeck | Miriam Exel
Proceedings of the 7th Workshop on Asian Translation

This paper accompanies the software documentation data set for machine translation, a parallel evaluation data set of data originating from the SAP Help Portal, that we released to the machine translation community for research purposes. It offers the possibility to tune and evaluate machine translation systems in the domain of corporate software documentation and contributes to the availability of a wider range of evaluation scenarios. The data set comprises of the language pairs English to Hindi, Indonesian, Malay and Thai, and thus also increases the test coverage for the many low-resource language pairs. Unlike most evaluation data sets that consist of plain parallel text, the segments in this data set come with additional metadata that describes structural information of the document context. We provide insights into the origin and creation, the particularities and characteristics of the data set as well as machine translation results.

pdf bib abs

Terminology-Constrained Neural Machine Translation at SAP
Miriam Exel | Bianka Buschbeck | Lauritz Brandt | Simona Doneva
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper examines approaches to bias a neural machine translation model to adhere to terminology constraints in an industrial setup. In particular, we investigate variations of the approach by Dinu et al. (2019), which uses inline annotation of the target terms in the source segment plus source factor embeddings during training and inference, and compare them to constrained decoding. We describe the challenges with respect to terminology in our usage scenario at SAP and show how far the investigated methods can help to overcome them. We extend the original study to a new language pair and provide an in-depth evaluation including an error classification and a human evaluation.

pdf bib abs

Incorporating External Annotation to improve Named Entity Translation in NMT
Maciej Modrzejewski | Miriam Exel | Bianka Buschbeck | Thanh-Le Ha | Alexander Waibel
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

The correct translation of named entities (NEs) still poses a challenge for conventional neural machine translation (NMT) systems. This study explores methods incorporating named entity recognition (NER) into NMT with the aim to improve named entity translation. It proposes an annotation method that integrates named entities and inside–outside–beginning (IOB) tagging into the neural network input with the use of source factors. Our experiments on English→German and English→ Chinese show that just by including different NE classes and IOB tagging, we can increase the BLEU score by around 1 point using the standard test set from WMT2019 and achieve up to 12% increase in NE translation rates over a strong baseline.

The Quaero program is an international project promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within the program framework, research organizations and industrial partners collaborate to develop prototypes of innovating applications and services for access and usage of multimedia data. One of the topics addressed is the translation of spoken language. Each year, a project-internal evaluation is conducted by DGA to monitor the technological advances. This work describes the design and results of the 2011 evaluation campaign. The participating partners were RWTH, KIT, LIMSI and SYSTRAN. Their approaches are compared on both ASR output and reference transcripts of speech data for the translation between French and German. The results show that the developed techniques further the state of the art and improve translation quality.

1998

pdf bib

Managing information at linguistic interfaces
Johan Bos | C.J. Rupp | Bianka Buschbeck-Wolf | Michael Dorna
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib

Managing Information at Linguistic Interfaces
Johan Bos | C.J. Rupp | Bianka Buschbeck-Wolf | Michael Dorna
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib abs

Quality and robustness in MT—A balancing act
Bianka Buschbeck-Wolf | Michael Dorna
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

The speech-to-speech translation system Verbmobil integrates deep and shallow analysis modules that produce linguistic representations in parallel. Thus, the input representations for the transfer module differ with respect to their depth and quality. This gives rise to two problems: (i) the transfer database has to be adjusted according to input quality, and (ii) translations produced have to be ranked with respect to their quality in order to select the most appropriate result. This paper presents an operationalized solution to both problems.

1996

pdf bib

Abstraction and underspecification in semantic transfer
Bernd Abb | Bianka Buschbeck-Wolf | Christel Tschernitschek
Conference of the Association for Machine Translation in the Americas

1991

pdf bib abs

Limits of a Sentence Based Procedural Approach for Aspect Choice in German-Russian MT
Bianka Buschbeck | Renate Henschel | Iris Höser | Gerda Klimonow | Andreas Küstner | Ingrid Starke
Fifth Conference of the European Chapter of the Association for Computational Linguistics

In this paper we discuss some problems arising in German-Russian Machine Translation with regard to tense and aspect. Since the formal category of aspect is missing in German the information required for generating Russian aspect forms has to be extracted from different representation levels. A sentence based procedure for aspect choice in the MT system VIRTEX is presented which takes lexical, morphological and semantic criteria into account. The limits of this approach are shown. To overcome these difficulties a human interaction component is proposed.

1990

pdf bib

Venues

ACL1

WAT1