Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works

Yaru Wu, Yuri Bizzoni, Pascale Moreira, Kristoffer Nielbo


Abstract
This study extends previous research on literary quality by using information theory-based methods to assess the level of perplexity recorded by three large language models when processing 20th-century English novels deemed to have high literary quality, recognized by experts as canonical, compared to a broader control group. We find that canonical texts appear to elicit a higher perplexity in the models, we explore which textual features might concur to create such an effect. We find that the usage of a more heavily nominal style, together with a more diverse vocabulary, is one of the leading causes of the difference between the two groups. These traits could reflect “strategies” to achieve an informationally dense literary style.
Anthology ID:
2024.latechclfl-1.16
Volume:
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Stan Szpakowicz
Venues:
LaTeCHCLfL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
172–184
Language:
URL:
https://aclanthology.org/2024.latechclfl-1.16
DOI:
Bibkey:
Cite (ACL):
Yaru Wu, Yuri Bizzoni, Pascale Moreira, and Kristoffer Nielbo. 2024. Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), pages 172–184, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works (Wu et al., LaTeCHCLfL-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.latechclfl-1.16.pdf
Supplementary material:
 2024.latechclfl-1.16.SupplementaryMaterial.zip