AGILe: The First Lemmatizer for Ancient Greek Inscriptions

Evelien de Graaf, Silvia Stopponi, Jasper K. Bos, Saskia Peels-Matthey, Malvina Nissim


Abstract
To facilitate corpus searches by classicists as well as to reduce data sparsity when training models, we focus on the automatic lemmatization of ancient Greek inscriptions, which have not received as much attention in this sense as literary text data has. We show that existing lemmatizers for ancient Greek, trained on literary data, are not performant on epigraphic data, due to major language differences between the two types of texts. We thus train the first inscription-specific lemmatizer achieving above 80% accuracy, and make both the models and the lemmatized data available to the community. We also provide a detailed error analysis highlighting peculiarities of inscriptions which again highlights the importance of a lemmatizer dedicated to inscriptions.
Anthology ID:
2022.lrec-1.571
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5334–5344
Language:
URL:
https://aclanthology.org/2022.lrec-1.571
DOI:
Bibkey:
Cite (ACL):
Evelien de Graaf, Silvia Stopponi, Jasper K. Bos, Saskia Peels-Matthey, and Malvina Nissim. 2022. AGILe: The First Lemmatizer for Ancient Greek Inscriptions. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5334–5344, Marseille, France. European Language Resources Association.
Cite (Informal):
AGILe: The First Lemmatizer for Ancient Greek Inscriptions (de Graaf et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.571.pdf