A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek

Giuseppe G. A. Celano

A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek

Abstract

This paper presents an experiment comparing six models to identify state-of-the-art models for Ancient Greek: a morphosyntactic parser and a lemmatizer that are capable of annotating in accordance with the Ancient Greek Dependency Treebank annotation scheme. A normalized version of the major collections of annotated texts was used to (i) train the baseline model Dithrax with randomly initialized character embeddings and (ii) fine-tune Trankit and four recent models pretrained on Ancient Greek texts, namely GreBERTa and PhilBERTa for morphosyntactic annotation and GreTA and PhilTa for lemmatization. A Bayesian analysis shows that Dithrax and Trankit are practically equivalent in morphological annotation, while syntax is best annotated by Trankit and lemmata by GreTa. The results of the experiment suggest that token embeddings are not sufficient to achieve high UAS and LAS scores unless they are coupled with a modeling strategy specifically designed to capture syntactic relationships. The dataset and best-performing models are made available online for reuse

Anthology ID:: 2025.lm4dh-1.5
Volume:: Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Isuri Nanomi Arachchige, Francesca Frontini, Ruslan Mitkov, Paul Rayson
Venues:: LM4DH | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:: 48–65
Language:
URL:: https://aclanthology.org/2025.lm4dh-1.5/
DOI:
Bibkey:
Cite (ACL):: Giuseppe G. A. Celano. 2025. A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek. In Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities, pages 48–65, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):: A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek (Celano, LM4DH 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.lm4dh-1.5.pdf

PDF Cite Search Fix data