Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining

Sebastian Reichbauer; Shu Okabe; Alexander Fraser

Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining

Sebastian Reichbauer, Shu Okabe, Alexander Fraser

Abstract

Cross-lingual detection of intertextuality and translation in Latin and Ancient Greek through computational approaches is of great interest for classical studies.While several systems exist for parallel sentence detection, based on general multilingual or specific models for Latin–Ancient Greek, they have not been compared against each other. Therefore, we present a synthetic benchmark to evaluate the performance of language models regarding cross-lingual Ancient Greek and Latin parallel sentence mining. We first compare six language models to encode sentences and then further improve the cross-lingual alignment through post-processing, fine-tuning, and knowledge distillation. We find that the whitening transformation in combination with knowledge distillation provides excellent results. Specifically, SPhilBERTa, a trilingual language model for Ancient Greek and Latin, benefits the most from the improvements and achieves a substantial mining score of 97.6 on our benchmark.

Anthology ID:: 2026.nlp4dh-1.11
Volume:: Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Month:: July
Year:: 2026
Address:: San Diego, USA
Editors:: Sil Hamilton, Emily Öhman, Rebecca M. M. Hicke, Yuri Bizzoni, Axel Bax, Jacob A. Matthews, Mika Hämäläinen
Venues:: NLP4DH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 106–120
Language:
URL:: https://aclanthology.org/2026.nlp4dh-1.11/
DOI:
Bibkey:
Cite (ACL):: Sebastian Reichbauer, Shu Okabe, and Alexander Fraser. 2026. Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining. In Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities, pages 106–120, San Diego, USA. Association for Computational Linguistics.
Cite (Informal):: Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining (Reichbauer et al., NLP4DH 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.nlp4dh-1.11.pdf

PDF Cite Search Fix data