Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

Marisa Hudspeth, Brendan O’Connor, Laure Thompson


Abstract
Existing Latin treebanks draw from Latin’s long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks’ annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.
Anthology ID:
2024.ml4al-1.21
Volume:
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
Month:
August
Year:
2024
Address:
Hybrid in Bangkok, Thailand and online
Editors:
John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
Venues:
ML4AL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
203–218
Language:
URL:
https://aclanthology.org/2024.ml4al-1.21
DOI:
Bibkey:
Cite (ACL):
Marisa Hudspeth, Brendan O’Connor, and Laure Thompson. 2024. Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 203–218, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
Cite (Informal):
Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time (Hudspeth et al., ML4AL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ml4al-1.21.pdf