Sarah Li Chen


2024

pdf bib
Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature
Sarah Li Chen | Patrick J. Burns | Thomas J. Bolt | Pramit Chaudhuri | Joseph P. Dexter
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)

In literary critical applications, stylometry can benefit from hand-curated feature sets capturing various syntactic and rhetorical functions. For premodern languages, calculation of such features is hampered by a lack of adequate computational resources for accurate part-of-speech tagging and semantic disambiguation. This paper reports an evaluation of POS-taggers for Latin and their use in augmenting a hand-curated stylometric feature set. Our experiments show that POS-augmented features not only provide more accurate counts than POS-blind features but also perform better on tasks such as genre classification. In the course of this work we introduce POS n-grams as a feature for Latin stylometry.