Development of Robust NER Models and Named Entity Tagsets for Ancient Greek

Chiara Palladino, Tariq Yousef


Abstract
This contribution presents a novel approach to the development and evaluation of transformer-based models for Named Entity Recognition and Classification in Ancient Greek texts. We trained two models with annotated datasets by consolidating potentially ambiguous entity types under a harmonized set of classes. Then, we tested their performance with out-of-domain texts, reproducing a real-world use case. Both models performed very well under these conditions, with the multilingual model being slightly superior on the monolingual one. In the conclusion, we emphasize current limitations due to the scarcity of high-quality annotated corpora and to the lack of cohesive annotation strategies for ancient languages.
Anthology ID:
2024.lt4hala-1.11
Volume:
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rachele Sprugnoli, Marco Passarotti
Venues:
LT4HALA | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
89–97
Language:
URL:
https://aclanthology.org/2024.lt4hala-1.11
DOI:
Bibkey:
Cite (ACL):
Chiara Palladino and Tariq Yousef. 2024. Development of Robust NER Models and Named Entity Tagsets for Ancient Greek. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 89–97, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Development of Robust NER Models and Named Entity Tagsets for Ancient Greek (Palladino & Yousef, LT4HALA-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lt4hala-1.11.pdf