Martin Volk

2025

pdf bib abs
Name Consistency in LLM-based Machine Translation of Historical Texts
Dominic P. Fischer | Martin Volk
Proceedings of Machine Translation Summit XX: Volume 1

Large Language Models (LLMs) excel at translating 16th-century letters from Latin and Early New High German to modern English and German. While they perform well at translating well-known historical city names (e.g., Lutetia –> Paris), their ability to handle person names (e.g., Theodor Bibliander) or lesser-known toponyms (e.g., Augusta Vindelicorum –> Augsburg) remains unclear. This study investigates LLM-based translations of person and place names across various frequency bands in a corpus of 16th-century letters. Our results show that LLMs struggle with person names, achieving accuracies around 60%, but perform better with place names, reaching accuracies around 90%. We further demonstrate that including a translation suggestion for the proper noun in the prompt substantially boosts accuracy, yielding highly reliable results.

2024

pdf bib
LLM-based Translation Across 500 Years. The Case for Early New High German
Martin Volk | Dominic P. Fischer | Patricia Scheurer | Raphael Schwitter | Phillip B. Ströbel
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

pdf bib abs
LLM-based Machine Translation and Summarization for Latin
Martin Volk | Dominic Philipp Fischer | Lukas Fischer | Patricia Scheurer | Phillip Benjamin Ströbel
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024

This paper presents an evaluation of machine translation for Latin. We tested multilingual Large Language Models, in particular GPT-4, on letters from the 16th century that are in Latin and Early New High German. Our experiments include translation and cross-language summarization for the two historical languages into modern English and German. We show that LLM-based translation for Latin is clearly superior to previous approaches. We also show that LLM-based paraphrasing of Latin paragraphs from the historical letters produces English and German summaries that are close to human summaries published in the edition.

pdf bib
Tracing Linguistic Footprints of ChatGPT Across Tasks, Domains and Personas in English and German
Anastassia Shaitarova | Nikolaj Bauer | Jannis Vamvas | Martin Volk
Proceedings of the 9th edition of the Swiss Text Analytics Conference

pdf bib
SwissText 2024 Shared Task: Automatic Classification of the United Nations’ Sustainable Development Goals (SDGs) and Their Targets in English Scientific Abstracts
Simon Clematide | Martin Volk | Tobias Fankhauser | Lorenz Hilty | Jürgen Bernard
Proceedings of the 9th edition of the Swiss Text Analytics Conference

pdf bib abs
Offensiveness, Hate, Emotion and GPT: Benchmarking GPT3.5 and GPT4 as Classifiers on Twitter-specific Datasets
Nikolaj Bauer | Moritz Preisig | Martin Volk
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024

In this paper, we extend the work of benchmarking GPT by turning GPT models into classifiers and applying them on three different Twitter datasets on Hate-Speech Detection, Offensive Language Detection, and Emotion Classification. We use a Zero-Shot and Few-Shot approach to evaluate the classification capabilities of the GPT models. Our results show that GPT models do not always beat fine-tuned models on the tested benchmarks. However, in Hate-Speech and Emotion Detection, using a Few-Shot approach, state-of-the-art performance can be achieved. The results also reveal that GPT-4 is more sensitive to the examples given in a Few-Shot prompt, highlighting the importance of choosing fitting examples for inference and prompt formulation.

2023

pdf bib abs
Machine vs. Human: Exploring Syntax and Lexicon in German Translations, with a Spotlight on Anglicisms
Anastassia Shaitarova | Anne Göhring | Martin Volk
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Machine Translation (MT) has become an integral part of daily life for millions of people, with its output being so fluent that users often cannot distinguish it from human translation. However, these fluid texts often harbor algorithmic traces, from limited lexical choices to societal misrepresentations. This raises concerns about the possible effects of MT on natural language and human communication and calls for regular evaluations of machine-generated translations for different languages. Our paper explores the output of three widely used engines (Google, DeepL, Microsoft Azure) and one smaller commercial system. We translate the English and French source texts of seven diverse parallel corpora into German and compare MT-produced texts to human references in terms of lexical, syntactic, and morphological features. Additionally, we investigate how MT leverages lexical borrowings and analyse the distribution of anglicisms across the German translations.

2022

pdf bib abs
Improving Specificity in Review Response Generation with Data-Driven Data Filtering
Tannon Kew | Martin Volk
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

Responding to online customer reviews has become an essential part of successfully managing and growing a business both in e-commerce and the hospitality and tourism sectors. Recently, neural text generation methods intended to assist authors in composing responses have been shown to deliver highly fluent and natural looking texts. However, they also tend to learn a strong, undesirable bias towards generating overly generic, one-size-fits-all outputs to a wide range of inputs. While this often results in ‘safe’, high-probability responses, there are many practical settings in which greater specificity is preferable. In this work we examine the task of generating more specific responses for online reviews in the hospitality domain by identifying generic responses in the training data, filtering them and fine-tuning the generation model. We experiment with a range of data-driven filtering methods and show through automatic and human evaluation that, despite a 60% reduction in the amount of training data, filtering helps to derive models that are capable of generating more specific, useful responses.

pdf bib abs
Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters
Martin Volk | Lukas Fischer | Patricia Scheurer | Bernard Silvan Schroffenegger | Raphael Schwitter | Phillip Ströbel | Benjamin Suter
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper is based on a collection of 16th century letters from and to the Zurich reformer Heinrich Bullinger. Around 12,000 letters of this exchange have been preserved, out of which 3100 have been professionally edited, and another 5500 are available as provisional transcriptions. We have investigated code-switching in these 8600 letters, first on the sentence-level and then on the word-level. In this paper we give an overview of the corpus and its language mix (mostly Early New High German and Latin, but also French, Greek, Italian and Hebrew). We report on our experiences with a popular language identifier and present our results when training an alternative identifier on a very small training corpus of only 150 sentences per language. We use the automatically labeled sentences in order to bootstrap a word-based language classifier which works with high accuracy. Our research around the corpus building and annotation involves automatic handwritten text recognition, text normalisation for ENH German, and machine translation from medieval Latin into modern German.

The evaluation of Handwritten Text Recognition (HTR) models during their development is straightforward: because HTR is a supervised problem, the usual data split into training, validation, and test data sets allows the evaluation of models in terms of accuracy or error rates. However, the evaluation process becomes tricky as soon as we switch from development to application. A compilation of a new (and forcibly smaller) ground truth (GT) from a sample of the data that we want to apply the model on and the subsequent evaluation of models thereon only provides hints about the quality of the recognised text, as do confidence scores (if available) the models return. Moreover, if we have several models at hand, we face a model selection problem since we want to obtain the best possible result during the application phase. This calls for GT-free metrics to select the best model, which is why we (re-)introduce and compare different metrics, from simple, lexicon-based to more elaborate ones using standard language models and masked language models (MLM). We show that MLM-based evaluation can compete with lexicon-based methods, with the advantage that large and multilingual transformers are readily available, thus making compiling lexical resources for other metrics superfluous.

pdf bib abs
Machine Translation of 16Th Century Letters from Latin to German
Lukas Fischer | Patricia Scheurer | Raphael Schwitter | Martin Volk
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper outlines our work in collecting training data for and developing a Latin–German Neural Machine Translation (NMT) system, for translating 16th century letters. While Latin–German is a low-resource language pair in terms of NMT, the domain of 16th century epistolary Latin is even more limited in this regard. Through our efforts in data collection and data generation, we are able to train a NMT model that provides good translations for short to medium sentences, and outperforms GoogleTranslate overall. We focus on the correspondence of the Swiss reformer Heinrich Bullinger, but our parallel corpus and our NMT system will be of use for many other texts of the time.

pdf bib abs
A Multilingual Simplified Language News Corpus
Renate Hauser | Jannis Vamvas | Sarah Ebling | Martin Volk
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

Simplified language news articles are being offered by specialized web portals in several countries. The thousands of articles that have been published over the years are a valuable resource for natural language processing, especially for efforts towards automatic text simplification. In this paper, we present SNIML, a large multilingual corpus of news in simplified language. The corpus contains 13k simplified news articles written in one of six languages: Finnish, French, Italian, Swedish, English, and German. All articles are shared under open licenses that permit academic use. The level of text simplification varies depending on the news portal. We believe that even though SNIML is not a parallel corpus, it can be useful as a complement to the more homogeneous but often smaller corpora of news in the simplified variety of one language that are currently in use.

2020

pdf bib abs
How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR
Phillip Benjamin Ströbel | Simon Clematide | Martin Volk
Proceedings of the Twelfth Language Resources and Evaluation Conference

Recent advances in Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) have led to more accurate textrecognition of historical documents. The Digital Humanities heavily profit from these developments, but they still struggle whenchoosing from the plethora of OCR systems available on the one hand and when defining workflows for their projects on the other hand. In this work, we present our approach to build a ground truth for a historical German-language newspaper published in black letter. Wealso report how we used it to systematically evaluate the performance of different OCR engines. Additionally, we used this ground truthto make an informed estimate as to how much data is necessary to achieve high-quality OCR results. The outcomes of our experimentsshow that HTR architectures can successfully recognise black letter text and that a ground truth size of 50 newspaper pages suffices toachieve good OCR accuracy. Moreover, our models perform equally well on data they have not seen during training, which means thatadditional manual correction for diverging data is superfluous.

pdf bib abs
Benchmarking Data-driven Automatic Text Simplification for German
Andreas Säuberli | Sarah Ebling | Martin Volk
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)

Automatic text simplification is an active research area, and there are first systems for English, Spanish, Portuguese, and Italian. For German, no data-driven approach exists to this date, due to a lack of training data. In this paper, we present a parallel corpus of news items in German with corresponding simplifications on two complexity levels. The simplifications have been produced according to a well-documented set of guidelines. We then report on experiments in automatically simplifying the German news items using state-of-the-art neural machine translation techniques. We demonstrate that despite our small parallel corpus, our neural models were able to learn essential features of simplified language, such as lexical substitutions, deletion of less relevant words and phrases, and sentence shortening.

2019

pdf bib
An Empirical Analysis of Linguistic, Typographic, and Structural Features in Simplified German Texts
Alessia Battisti | Sarah Ebling | Martin Volk
Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

pdf bib
Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain
Samuel Läubli | Chantal Amrhein | Patrick Düggelin | Beatriz Gonzalez | Alena Zwahlen | Martin Volk
Proceedings of Machine Translation Summit XVII: Research Track

pdf bib abs
Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition
Tannon Kew | Anastassia Shaitarova | Isabel Meraner | Janis Goldzycher | Simon Clematide | Martin Volk
Proceedings of the Workshop on Language Technology for Digital Historical Archives

Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-based searching and new geographically related discoveries. In this paper, we describe two distinct approaches to geotagging a variety of fine-grained toponyms in a diachronic corpus of alpine texts. By applying a traditional gazetteer-based approach, aided by a few simple heuristics, we attain strong high-precision annotations. Using the output of this earlier system, we adopt a state-of-the-art neural approach in order to facilitate the detection of new toponyms on the basis of context. Additionally, we present the results of preliminary experiments on integrating a small amount of crowdsourced annotations to improve overall performance of toponym recognition in our heritage corpus.

2018

pdf bib abs
mtrain: A Convenience Tool for Machine Translation
Samuel Läubli | Mathias Müller | Beat Horat | Martin Volk
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

We present mtrain, a convenience tool for machine translation. It wraps existing machine translation libraries and scripts to ease their use. mtrain is written purely in Python 3, well-documented, and freely available.1

pdf bib abs
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
Samuel Läubli | Rico Sennrich | Martin Volk
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.

2017

pdf bib
Multilingwis² – Explore Your Parallel Corpus
Johannes Graën | Dominique Sandoz | Martin Volk
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib abs
Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
Simon Clematide | Lenz Furrer | Martin Volk
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historic text collections. We report on our crowd-correction platform Kokos, which we built to improve the OCR quality of the digitized yearbooks of the Swiss Alpine Club (SAC) from the 19th century. This multilingual heritage corpus consists of Alpine texts mainly written in German and French, all typeset in Antiqua font. Finding and engaging volunteers for correcting large amounts of pages into high quality text requires a carefully designed user interface, an easy-to-use workflow, and continuous efforts for keeping the participants motivated. More than 180,000 characters on about 21,000 pages were corrected by volunteers in about 7 month, achieving an OCR gold standard with a systematically evaluated accuracy of 99.7% on the word level. The crowdsourced OCR gold standard and the corresponding original OCR recognition results from Abby FineReader 7 for each page are available as a resource. Additionally, the scanned images (300dpi) of all pages are included in order to facilitate tests with other OCR software.

2015

pdf bib
Pre-reordering for Statistical Machine Translation of Non-fictional Subtitles
Magdalena Plamada | Gion Linder | Phillip Ströbel | Martin Volk
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Detecting Document-level Context Triggers to Resolve Translation Ambiguity
Laura Mascarell | Mark Fishel | Martin Volk
Proceedings of the Second Workshop on Discourse in Machine Translation

pdf bib
Pre-reordering for Statistical Machine Translation of Non-fictional Subtitles
Magdalena Plamadă | Gion Linder | Phillip Ströbel | Martin Volk
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

This article describes a large-scale evaluation of the use of Statistical Machine Translation for professional subtitling. The work was carried out within the FP7 EU-funded project SUMAT and involved two rounds of evaluation: a quality evaluation and a measure of productivity gain/loss. We present the SMT systems built for the project and the corpora they were trained on, which combine professionally created and crowd-sourced data. Evaluation goals, methodology and results are presented for the eleven translation pairs that were evaluated by professional subtitlers. Overall, a majority of the machine translated subtitles received good quality ratings. The results were also positive in terms of productivity, with a global gain approaching 40%. We also evaluated the impact of applying quality estimation and filtering of poor MT output, which resulted in higher productivity gains for filtered files as opposed to fully machine-translated files. Finally, we present and discuss feedback from the subtitlers who participated in the evaluation, a key aspect for any eventual adoption of machine translation technology in professional subtitling.

pdf bib abs
Innovations in Parallel Corpus Search Tools
Martin Volk | Johannes Graën | Elena Callegaro
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.

pdf bib
Detecting Code-Switching in a Multilingual Alpine Heritage Corpus
Martin Volk | Simon Clematide
Proceedings of the First Workshop on Computational Approaches to Code Switching

2013

pdf bib
Statistical Machine Translation for Automobile Marketing Texts
Samuel Läubli | Mark Fishel | Manuela Weibel | Martin Volk
Proceedings of Machine Translation Summit XIV: Posters

pdf bib
Assessing post-editing efficiency in a realistic translation environment
Samuel Läubli | Mark Fishel | Gary Massey | Maureen Ehrensberger-Dow | Martin Volk
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

pdf bib
Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis
Rico Sennrich | Martin Volk | Gerold Schneider
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Mining for Domain-specific Parallel Text from Wikipedia
Magdalena Plamadă | Martin Volk
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
Building a German/Simple German Parallel Corpus for Automatic Text Simplification
David Klaper | Sarah Ebling | Martin Volk
Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations

pdf bib
Combining Statistical Machine Translation and Translation Memories with Domain Adaptation
Samuel Läubli | Mark Fishel | Martin Volk | Manuela Weibel
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process. The FP7 European project SUMAT (An Online Service for SUbtitling by MAchine Translation: http://www.sumat-project.eu) aims to develop an online subtitle translation service for nine European languages, combined into 14 different language pairs, in order to semi-automate the subtitle translation processes of both freelance translators and subtitling companies on a large scale. In this paper we discuss the data collection and parallel corpus compilation for training SMT systems, which includes several procedures such as data partition, conversion, formatting, normalization and alignment. We discuss in detail each data pre-processing step using various approaches. Apart from the quantity (around 1 million subtitles per language pair), the SUMAT corpus has a number of very important characteristics. First of all, high quality both in terms of translation and in terms of high-precision alignment of parallel documents and their contents has been achieved. Secondly, the contents are provided in one consistent format and encoding. Finally, additional information such as type of content in terms of genres and domain is available.

2011

pdf bib
Combining Semantic and Syntactic Generalization in Example-Based Machine Translation
Sarah Ebling | Andy Way | Martin Volk | Sudip Kumar Naskar
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf bib abs
Le corpus Text+Berg Une ressource parallèle alpin français-allemand (The Text+Berg Corpus An Alpine French-German Parallel Resource)
Anne Göhring | Martin Volk
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente un corpus parallèle français-allemand de plus de 4 millions de mots issu de la numérisation d’un corpus alpin multilingue. Ce corpus est une précieuse ressource pour de nombreuses études de linguistique comparée et du patrimoine culturel ainsi que pour le développement d’un système statistique de traduction automatique dans un domaine spécifique. Nous avons annoté un échantillon de ce corpus parallèle et aligné les structures arborées au niveau des mots, des constituants et des phrases. Cet “alpine treebank” est le premier corpus arboré parallèle français-allemand de haute qualité (manuellement contrôlé), de libre accès et dans un domaine et un genre nouveau : le récit d’alpinisme.

pdf bib
Reducing OCR Errors in Gothic-Script Documents
Lenz Furrer | Martin Volk
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage

pdf bib
Iterative, MT-based Sentence Alignment of Parallel Texts
Rico Sennrich | Martin Volk
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
Disambiguation of English Contractions for Machine Translation of TV Subtitles
Martin Volk | Rico Sennrich
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf bib abs
MT-based Sentence Alignment for OCR-generated Parallel Texts
Rico Sennrich | Martin Volk
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

The performance of current sentence alignment tools varies according to the to-be-aligned texts. We have found existing tools unsuitable for hard-to-align parallel texts and describe an alternative alignment algorithm. The basic idea is to use machine translations of a text and BLEU as a similarity score to find reliable alignments which are used as anchor points. The gaps between these anchor points are then filled using BLEU-based and length-based heuristics. We show that this approach outperforms state-of-the-art algorithms in our alignment task, and that this improvement in alignment quality translates into better SMT performance. Furthermore, we show that even length-based alignment algorithms profit from having a machine translation as a point of comparison.

pdf bib abs
Machine Translation of TV Subtitles for Large Scale Production
Martin Volk | Rico Sennrich | Christian Hardmeier | Frida Tidström
Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry

This paper describes our work on building and employing Statistical Machine Translation systems for TV subtitles in Scandinavia. We have built translation systems for Danish, English, Norwegian and Swedish. They are used in daily subtitle production and translate large volumes. As an example we report on our evaluation results for three TV genres. We discuss our lessons learned in the system development process which shed interesting light on the practical use of Machine Translation technology.

pdf bib abs
Challenges in Building a Multilingual Alpine Heritage Corpus
Martin Volk | Noah Bubenhofer | Adrian Althaus | Maya Bangerter | Lenz Furrer | Beni Ruef
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes our efforts to build a multilingual heritage corpus of alpine texts. Currently we digitize the yearbooks of the Swiss Alpine Club which contain articles in French, German, Italian and Romansch. Articles comprise mountaineering reports from all corners of the earth, but also scientific topics such as topography, geology or glacierology as well as occasional poetry and lyrics. We have already scanned close to 70,000 pages which has resulted in a corpus of 25 million words, 10% of which is a parallel French-German corpus. We have solved a number of challenges in automatic language identification and text structure recognition. Our next goal is to identify the great variety of toponyms (e.g. names of mountains and valleys, glaciers and rivers, trails and cabins) in this corpus, and we sketch how a large gazetteer of Swiss topographical names can be exploited for this purpose. Despite the size of the resource, exact matching leads to a low recall because of spelling variations, language mixtures and partial repetitions.

pdf bib
Combining Parallel Treebanks and Geo-Tagging
Martin Volk | Anne Goehring | Torsten Marek
Proceedings of the Fourth Linguistic Annotation Workshop

Martin Volk

2025

2024

2023

2022

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2004

2003

2002

2000

1997

1992

1991

Co-authors

Venues