Sarah Li Chen
2024
Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature
Sarah Li Chen
|
Patrick J. Burns
|
Thomas J. Bolt
|
Pramit Chaudhuri
|
Joseph P. Dexter
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
In literary critical applications, stylometry can benefit from hand-curated feature sets capturing various syntactic and rhetorical functions. For premodern languages, calculation of such features is hampered by a lack of adequate computational resources for accurate part-of-speech tagging and semantic disambiguation. This paper reports an evaluation of POS-taggers for Latin and their use in augmenting a hand-curated stylometric feature set. Our experiments show that POS-augmented features not only provide more accurate counts than POS-blind features but also perform better on tasks such as genre classification. In the course of this work we introduce POS n-grams as a feature for Latin stylometry.