Neural Lemmatization and POS-tagging models for Coptic, Demotic and Earlier Egyptian

Aleksi Sahala, Eliese-Sophia Lincke


Abstract
We present models for lemmatizing and POS-tagging Earlier Egyptian, Coptic and Demotic to test the performance of our pipeline for the ancient languages of Egypt. Of these languages, Demotic and Egyptian are known to be difficult to annotate due to their high extent of ambiguity. We report lemmatization accuracy of 86%, 91% and 99%, and XPOS-tagging accuracy of 89%, 95% and 98% for Earlier Egyptian, Demotic and Coptic, respectively.
Anthology ID:
2024.ml4al-1.10
Volume:
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
Month:
August
Year:
2024
Address:
Hybrid in Bangkok, Thailand and online
Editors:
John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
Venues:
ML4AL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–97
Language:
URL:
https://aclanthology.org/2024.ml4al-1.10
DOI:
10.18653/v1/2024.ml4al-1.10
Bibkey:
Cite (ACL):
Aleksi Sahala and Eliese-Sophia Lincke. 2024. Neural Lemmatization and POS-tagging models for Coptic, Demotic and Earlier Egyptian. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 87–97, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
Cite (Informal):
Neural Lemmatization and POS-tagging models for Coptic, Demotic and Earlier Egyptian (Sahala & Lincke, ML4AL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ml4al-1.10.pdf