Jan Kostkan


2023

pdf bib
OdyCy – A general-purpose NLP pipeline for Ancient Greek
Jan Kostkan | Márton Kardos | Jacob Palle Bliddal Mortensen | Kristoffer Laigaard Nielbo
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents a general-purpose NLP pipeline that achieves state-of-the-art performance on the Ancient Greek Perseus UD Treebank for several tasks (POS Tagging, Morphological Analysis and Dependency Parsing), and close to state-of-the-art performance on the Proiel UD Treebank. Our aim is to provide a reproducible, open source language processing pipeline for Ancient Greek, capable of handling input texts of varying quality. We measure the performance of our model against other comparable tools and then evaluate lemmatization errors.