Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature

Sarah Chen, Patrick Burns, Thomas Bolt, Pramit Chaudhuri, Joseph Dexter


Abstract
In literary critical applications, stylometry can benefit from hand-curated feature sets capturing various syntactic and rhetorical functions. For premodern languages, calculation of such features is hampered by a lack of adequate computational resources for accurate part-of-speech tagging and semantic disambiguation. This paper reports an evaluation of POS-taggers for Latin and their use in augmenting a hand-curated stylometric feature set. Our experiments show that POS-augmented features not only provide more accurate counts than POS-blind features but also perform better on tasks such as genre classification. In the course of this work we introduce POS n-grams as a feature for Latin stylometry.
Anthology ID:
2024.ml4al-1.24
Volume:
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)
Month:
August
Year:
2024
Address:
Hybrid in Bangkok, Thailand and online
Editors:
John Pavlopoulos, Thea Sommerschield, Yannis Assael, Shai Gordin, Kyunghyun Cho, Marco Passarotti, Rachele Sprugnoli, Yudong Liu, Bin Li, Adam Anderson
Venues:
ML4AL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
251–259
Language:
URL:
https://aclanthology.org/2024.ml4al-1.24
DOI:
Bibkey:
Cite (ACL):
Sarah Chen, Patrick Burns, Thomas Bolt, Pramit Chaudhuri, and Joseph Dexter. 2024. Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature. In Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 251–259, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics.
Cite (Informal):
Leveraging Part-of-Speech Tagging for Enhanced Stylometry of Latin Literature (Chen et al., ML4AL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ml4al-1.24.pdf