Latin Morphology through the Centuries: Ensuring Consistency for Better Language Processing

Federica Gamba, Daniel Zeman


Abstract
This paper focuses on the process of harmonising the five Latin treebanks available in Universal Dependencies with respect to morphological annotation. We propose a workflow that allows to first spot inconsistencies and missing information, in order to detect to what extent the annotations differ, and then correct the retrieved bugs, with the goal of equalising the annotation of morphological features in the treebanks and producing more consistent linguistic data. Subsequently, we present some experiments carried out with UDPipe and Stanza in order to assess the impact of such harmonisation on parsing accuracy.
Anthology ID:
2023.alp-1.7
Volume:
Proceedings of the Ancient Language Processing Workshop
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti
Venues:
ALP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
59–67
Language:
URL:
https://aclanthology.org/2023.alp-1.7
DOI:
Bibkey:
Cite (ACL):
Federica Gamba and Daniel Zeman. 2023. Latin Morphology through the Centuries: Ensuring Consistency for Better Language Processing. In Proceedings of the Ancient Language Processing Workshop, pages 59–67, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Latin Morphology through the Centuries: Ensuring Consistency for Better Language Processing (Gamba & Zeman, ALP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.alp-1.7.pdf