Charlie Cowen-Breen


2023

pdf bib
Logion: Machine-Learning Based Detection and Correction of Textual Errors in Greek Philology
Charlie Cowen-Breen | Creston Brooks | Barbara Graziosi | Johannes Haubold
Proceedings of the Ancient Language Processing Workshop

We present statistical and machine-learning based techniques for detecting and correcting errors in text and apply them to the challenge of textual corruption in Greek philology. Most ancient Greek texts reach us through a long process of copying, in relay, from earlier manuscripts (now lost). In this process of textual transmission, copying errors tend to accrue. After training a BERT model on the largest premodern Greek dataset used for this purpose to date, we identify and correct previously undetected errors made by scribes in the process of textual transmission, in what is, to our knowledge, the first successful identification of such errors via machine learning. The premodern Greek BERT model we train is available for use at https://huggingface.co/cabrooks/LOGION-base.