2024
pdf
bib
abs
LLM-based Machine Translation and Summarization for Latin
Martin Volk
|
Dominic Philipp Fischer
|
Lukas Fischer
|
Patricia Scheurer
|
Phillip Benjamin Ströbel
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
This paper presents an evaluation of machine translation for Latin. We tested multilingual Large Language Models, in particular GPT-4, on letters from the 16th century that are in Latin and Early New High German. Our experiments include translation and cross-language summarization for the two historical languages into modern English and German. We show that LLM-based translation for Latin is clearly superior to previous approaches. We also show that LLM-based paraphrasing of Latin paragraphs from the historical letters produces English and German summaries that are close to human summaries published in the edition.
2022
pdf
bib
abs
Machine Translation of 16Th Century Letters from Latin to German
Lukas Fischer
|
Patricia Scheurer
|
Raphael Schwitter
|
Martin Volk
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
This paper outlines our work in collecting training data for and developing a Latin–German Neural Machine Translation (NMT) system, for translating 16th century letters. While Latin–German is a low-resource language pair in terms of NMT, the domain of 16th century epistolary Latin is even more limited in this regard. Through our efforts in data collection and data generation, we are able to train a NMT model that provides good translations for short to medium sentences, and outperforms GoogleTranslate overall. We focus on the correspondence of the Swiss reformer Heinrich Bullinger, but our parallel corpus and our NMT system will be of use for many other texts of the time.
pdf
bib
abs
Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters
Martin Volk
|
Lukas Fischer
|
Patricia Scheurer
|
Bernard Silvan Schroffenegger
|
Raphael Schwitter
|
Phillip Ströbel
|
Benjamin Suter
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper is based on a collection of 16th century letters from and to the Zurich reformer Heinrich Bullinger. Around 12,000 letters of this exchange have been preserved, out of which 3100 have been professionally edited, and another 5500 are available as provisional transcriptions. We have investigated code-switching in these 8600 letters, first on the sentence-level and then on the word-level. In this paper we give an overview of the corpus and its language mix (mostly Early New High German and Latin, but also French, Greek, Italian and Hebrew). We report on our experiences with a popular language identifier and present our results when training an alternative identifier on a very small training corpus of only 150 sentences per language. We use the automatically labeled sentences in order to bootstrap a word-based language classifier which works with high accuracy. Our research around the corpus building and annotation involves automatic handwritten text recognition, text normalisation for ENH German, and machine translation from medieval Latin into modern German.
2020
pdf
bib
abs
What’s the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT
Lukas Fischer
|
Samuel Läubli
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
Machine translation (MT) has been shown to produce a number of errors that require human post-editing, but the extent to which professional human translation (HT) contains such errors has not yet been compared to MT. We compile pre-translated documents in which MT and HT are interleaved, and ask professional translators to flag errors and post-edit these documents in a blind evaluation. We find that the post-editing effort for MT segments is only higher in two out of three language pairs, and that the number of segments with wrong terminology, omissions, and typographical problems is similar in HT.