Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek

Maciej Rapacz, Aleksander Smywiński-Pohl


Abstract
Contemporary machine translation systems prioritize fluent, natural-sounding output with flexible word ordering. In contrast, interlinear translation maintains the source text’s syntactic structure by aligning target language words directly beneath their source counterparts. Despite its importance in classical scholarship, automated approaches to interlinear translation remain understudied. We evaluated neural interlinear translation from Ancient Greek to English and Polish using four transformer-based models: two Ancient Greek-specialized (GreTa and PhilTa) and two general-purpose multilingual models (mT5-base and mT5-large). Our approach introduces novel morphological embedding layers and evaluates text preprocessing and tag set selection across 144 experimental configurations using a word-aligned parallel corpus of the Greek New Testament. Results show that morphological features through dedicated embedding layers significantly enhance translation quality, improving BLEU scores by 35% (44.67 → 60.40) for English and 38% (42.92 → 59.33) for Polish compared to baseline models. PhilTa achieves state-of-the-art performance for English, while mT5-large does so for Polish. Notably, PhilTa maintains stable performance using only 10% of training data. Our findings challenge the assumption that modern neural architectures cannot benefit from explicit morphological annotations. While preprocessing strategies and tag set selection show minimal impact, the substantial gains from morphological embeddings demonstrate their value in low-resource scenarios.
Anthology ID:
2025.loreslm-1.11
Volume:
Proceedings of the First Workshop on Language Models for Low-Resource Languages
Month:
January
Year:
2025
Address:
Abu Dhabi, United Arab Emirates
Editors:
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Venues:
LoResLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
145–165
Language:
URL:
https://aclanthology.org/2025.loreslm-1.11/
DOI:
Bibkey:
Cite (ACL):
Maciej Rapacz and Aleksander Smywiński-Pohl. 2025. Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 145–165, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for Ancient Greek (Rapacz & Smywiński-Pohl, LoResLM 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.loreslm-1.11.pdf