Stylometric Classification of Ancient Greek Literary Texts by Genre

Efthimios Gianitsos, Thomas Bolt, Pramit Chaudhuri, Joseph P. Dexter


Abstract
Classification of texts by genre is an important application of natural language processing to literary corpora but remains understudied for premodern and non-English traditions. We develop a stylometric feature set for ancient Greek that enables identification of texts as prose or verse. The set contains over 20 primarily syntactic features, which are calculated according to custom, language-specific heuristics. Using these features, we classify almost all surviving classical Greek literature as prose or verse with >97% accuracy and F1 score, and further classify a selection of the verse texts into the traditional genres of epic and drama.
Anthology ID:
W19-2507
Volume:
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
June
Year:
2019
Address:
Minneapolis, USA
Editors:
Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
Venue:
LaTeCH
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–60
Language:
URL:
https://aclanthology.org/W19-2507
DOI:
10.18653/v1/W19-2507
Bibkey:
Cite (ACL):
Efthimios Gianitsos, Thomas Bolt, Pramit Chaudhuri, and Joseph P. Dexter. 2019. Stylometric Classification of Ancient Greek Literary Texts by Genre. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 52–60, Minneapolis, USA. Association for Computational Linguistics.
Cite (Informal):
Stylometric Classification of Ancient Greek Literary Texts by Genre (Gianitsos et al., LaTeCH 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2507.pdf
Software:
 W19-2507.Software.zip