Logion: Machine-Learning Based Detection and Correction of Textual Errors in Greek Philology

Charlie Cowen-Breen, Creston Brooks, Barbara Graziosi, Johannes Haubold


Abstract
We present statistical and machine-learning based techniques for detecting and correcting errors in text and apply them to the challenge of textual corruption in Greek philology. Most ancient Greek texts reach us through a long process of copying, in relay, from earlier manuscripts (now lost). In this process of textual transmission, copying errors tend to accrue. After training a BERT model on the largest premodern Greek dataset used for this purpose to date, we identify and correct previously undetected errors made by scribes in the process of textual transmission, in what is, to our knowledge, the first successful identification of such errors via machine learning. The premodern Greek BERT model we train is available for use at https://huggingface.co/cabrooks/LOGION-base.
Anthology ID:
2023.alp-1.20
Volume:
Proceedings of the Ancient Language Processing Workshop
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti
Venues:
ALP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
170–178
Language:
URL:
https://aclanthology.org/2023.alp-1.20
DOI:
Bibkey:
Cite (ACL):
Charlie Cowen-Breen, Creston Brooks, Barbara Graziosi, and Johannes Haubold. 2023. Logion: Machine-Learning Based Detection and Correction of Textual Errors in Greek Philology. In Proceedings of the Ancient Language Processing Workshop, pages 170–178, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Logion: Machine-Learning Based Detection and Correction of Textual Errors in Greek Philology (Cowen-Breen et al., ALP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.alp-1.20.pdf