2025
pdf
bib
abs
Why Novels (Don’t) Break Through: Dynamics of Canonicity in the Danish Modern Breakthrough (1870-1900)
Alie Lassche
|
Pascale Feldkamp
|
Yuri Bizzoni
|
Katrine Baunvig
|
Kristoffer Nielbo
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
Recent studies suggest that canonical works possess unique textual profiles, often tied to innovation and higher cognitive demands. However, recent work on Danish 19th century literary novels has shown that some non-canonical works shared similar textual qualities with canonical works, underscoring the role of text-extrinsic factors in shaping canonicity. The present study examines the same corpus (more than 800 Danish novels from the Modern Breakthrough era (1870–1900)) to explore socio-economic and institutional factors, as well as demographic features, specifically, book prices, publishers, and the author’s nationality – in determining canonical status. We combine expert-based and national definitions of canon to set up a classification experiment to test the predictive power of these external features, and to understand how they relate to that of text-intrinsic features. We show that the canonization process is influenced by external factors – such as publisher and nationality – but that text-intrinsic features nevertheless maintain predictive power in a dynamic interplay of text and context.
pdf
bib
abs
Effects of Publicity and Complexity in Reader Polarization
Yuri Bizzoni
|
Pascale Feldkamp
|
Kristoffer Nielbo
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
We investigate how Goodreads rating distributions reflect variations in audience reception across literary works. By examining a large-scale dataset of novels, we analyze whether metrics such as the entropy or standard deviation of rating distributions correlate with textual features – including perplexity, nominal ratio, and syntactic complexity. These metrics reveal a disagreement continuum: more complex texts – i.e., more cognitively demanding books, with a more canon-like textual profile – generate polarized reader responses, while mainstream works produce more uniform reactions. We compare evaluation patterns across canonical and non-canonical works, bestsellers, and prize-winners, finding that textual complexity drives rating polarization even when controlling for publicity effects. Our findings demonstrate that linguistically unpredictable texts, particularly those with higher nominal density and dependency distance, generate divergent reader evaluations. This challenges conventional literary success metrics and suggests that the shape of rating distributions offers valuable insights beyond average scores. We hope our approach establishes a productive framework for understanding how literary features influence reception and how disagreement metrics can enhance our understanding of public literary judgment.
pdf
bib
abs
Modeling Multilayered Complexity in Literary Texts
Pascale Feldkamp
|
Márton Kardos
|
Kristoffer Nielbo
|
Yuri Bizzoni
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
We explore the relationship between stylistic and sentimental complexity in literary texts, analyzing how they interact and affect overall complexity. Using a dataset of over 9,000 English novels (19th-20th century), we find that complexity at the stylistic/syntactic and sentiment levels tend to show a linear association. Finally, using dedicated datasets, we show that both stylistic/syntactic features – particularly those relating to information density – as well as sentiment features are related to text difficulty rank as well as average processing time.
2024
pdf
bib
Towards a GoldenHymns Dataset for Studying Diachronic Trends in 19th Century Danish Religious Hymns
Ea Lindhardt Overgaard
|
Pascale Feldkamp
|
Yuri Bizzoni
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
pdf
bib
abs
Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870–1900)
Pascale Feldkamp
|
Alie Lassche
|
Jan Kostkan
|
Márton Kardos
|
Kenneth Enevoldsen
|
Katrine Baunvig
|
Kristoffer Nielbo
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
We examine the relationship between the canonization of Danish novels and their textual innovation and influence, taking the Danish Modern Breakthrough era (1870–1900) as a case study. We evaluate whether canonical novels introduced a significant textual novelty in their time, and explore their influence on the overall literary trend of the period. By analyzing the positions of canonical versus non-canonical novels in semantic space, we seek to better understand the link between a novel’s canonical status and its literary impact. Additionally, we examine the overall diversification of Modern Breakthrough novels during this significant period of rising literary readership. We find that canonical novels stand out from both the historical novel genre and non-canonical novels of the period. Our findings on diversification within and across groups indicate that the novels now regarded as canonical served as literary trendsetters of their time.
pdf
bib
abs
Below the Sea (with the Sharks): Probing Textual Features of Implicit Sentiment in a Literary Case-study
Yuri Bizzoni
|
Pascale Feldkamp
Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language
Literary language presents an ongoing challenge for Sentiment Analysis due to its complex, nuanced, and layered form of expression. It is often suggested that effective literary writing is evocative, operating beneath the surface and understating emotional expression. To explore features of implicitness in literary expression, this study takes Ernest Hemingway’s The Old Man and the Sea as a case for examining implicit sentiment expression. We examine sentences where automatic sentiment annotations show substantial divergences from human sentiment annotations, and probe these sentences for distinctive traits. We find that sentences where humans perceived a strong sentiment while models did not are significantly lower in arousal and higher in concreteness than sentences where humans and models were more aligned, suggesting the importance of simplicity and concreteness for implicit sentiment expression in literary prose.
pdf
bib
abs
Comparing Tools for Sentiment Analysis of Danish Literature from Hymns to Fairy Tales: Low-Resource Language and Domain Challenges
Pascale Feldkamp
|
Jan Kostkan
|
Ea Overgaard
|
Mia Jacobsen
|
Yuri Bizzoni
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
While Sentiment Analysis has become increasingly central in computational approaches to literary texts, the literary domain still poses important challenges for the detection of textual sentiment due to its highly complex use of language and devices - from subtle humor to poetic imagery. Furthermore these challenges are only further amplified in low-resource language and domain settings. In this paper we investigate the application and efficacy of different Sentiment Analysis tools on Danish literary texts, using historical fairy tales and religious hymns as our datasets. The scarcity of linguistic resources for Danish and the historical context of the data further compounds the challenges for the tools. We compare human annotations to the continuous valence scores of both transformer- and dictionary-based Sentiment Analysis methods to assess their performance, seeking to understand how distinct methods handle the language of Danish prose and poetry.
2023
pdf
bib
abs
Comparing Transformer and Dictionary-based Sentiment Models for Literary Texts: Hemingway as a Case-study
Yuri Bizzoni
|
Pascale Feldkamp
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
The literary domain continues to pose a challenge for Sentiment Analysis methods, due to its particularly nuanced and layered nature. This paper explores the adequacy of different Sentiment Analysis tools - from dictionary-based approaches to state-of-the-art Transformers - for capturing valence and modelling sentiment arcs. We take Ernest Hemingway’s novel The Old Man and the Sea as a case study to address challenges inherent to literary language, compare Transformer and rule-based systems’ scores with human annotations, and shed light on the complexities of analyzing sentiment in narrative texts. Finally, we emphasize the potential of model ensembles.
pdf
bib
abs
Readability and Complexity: Diachronic Evolution of Literary Language Across 9000 Novels
Pascale Feldkamp
|
Yuri Bizzoni
|
Ida Marie S. Lassen
|
Mads Rosendahl Thomsen
|
Kristoffer Nielbo
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
Using a large corpus of English language novels from 1880 to 2000, we compare several textual features associated with literary quality, seeking to examine developments in literary language and narrative complexity through time. We show that while we find a correlation between the features, readability metrics are the only ones that exhibit a steady evolution, indicating that novels become easier to read through the 20th century but not simpler. We discuss the possibility of cultural selection as a factor and compare our findings with a subset of canonical works.