Pascale Moreira


2024

pdf bib
Perplexing Canon: A study on GPT-based perplexity of canonical and non-canonical literary works
Yaru Wu | Yuri Bizzoni | Pascale Moreira | Kristoffer Nielbo
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

This study extends previous research on literary quality by using information theory-based methods to assess the level of perplexity recorded by three large language models when processing 20th-century English novels deemed to have high literary quality, recognized by experts as canonical, compared to a broader control group. We find that canonical texts appear to elicit a higher perplexity in the models, we explore which textual features might concur to create such an effect. We find that the usage of a more heavily nominal style, together with a more diverse vocabulary, is one of the leading causes of the difference between the two groups. These traits could reflect “strategies” to achieve an informationally dense literary style.

2023

pdf bib
Modeling Readers’ Appreciation of Literary Narratives Through Sentiment Arcs and Semantic Profiles
Pascale Moreira | Yuri Bizzoni | Kristoffer Nielbo | Ida Marie Lassen | Mads Thomsen
Proceedings of the 5th Workshop on Narrative Understanding

Predicting literary quality and reader appreciation of narrative texts are highly complex challenges in quantitative and computational literary studies due to the fluid definitions of quality and the vast feature space that can be considered when modeling a literary work. This paper investigates the potential of sentiment arcs combined with topical-semantic profiling of literary narratives as indicators for their literary quality. Our experiments focus on a large corpus of 19th and 20the century English language literary fiction, using GoodReads’ ratings as an imperfect approximation of the diverse range of reader evaluations and preferences. By leveraging a stacked ensemble of regression models, we achieve a promising performance in predicting average readers’ scores, indicating the potential of our approach in modeling literary quality.

pdf bib
Dimensions of Quality: Contrasting Stylistic vs. Semantic Features for Modelling Literary Quality in 9,000 Novels
Pascale Moreira | Yuri Bizzoni
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In computational literary studies, the challenging task of predicting quality or reader-appreciation of narrative texts is confounded by volatile definitions of quality and the vast feature space that may be considered in modeling. In this paper, we explore two different types of feature sets: stylistic features on one hand, and semantic features on the other. We conduct experiments on a corpus of 9,089 English language literary novels published in the 19th and 20th century, using GoodReads’ ratings as a proxy for reader-appreciation. Examining the potential of both approaches, we find that some types of books are more predictable in one model than in the other, which may indicate that texts have different prominent characteristics (stylistic complexity, a certain narrative progression at the sentiment-level).

pdf bib
Sentimental Matters - Predicting Literary Quality by Sentiment Analysis and Stylometric Features
Yuri Bizzoni | Pascale Moreira | Mads Rosendahl Thomsen | Kristoffer Nielbo
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Over the years, the task of predicting reader appreciation or literary quality has been the object of several studies, but it remains a challenging problem in quantitative literary studies and computational linguistics alike, as its definition can vary a lot depending on the genre, the adopted features and the annotation system. This paper attempts to evaluate the impact of sentiment arc modelling versus more classical stylometric features for user-ratings of novels. We run our experiments on a corpus of English language narrative literary fiction from the 19th and 20th century, showing that syntactic and surface-level features can be powerful for the study of literary quality, but can be outperformed by sentiment-characteristics of a text.

pdf bib
Good Reads and Easy Novels: Readability and Literary Quality in a Corpus of US-published Fiction
Yuri Bizzoni | Pascale Moreira | Nicole Dwenger | Ida Lassen | Mads Thomsen | Kristoffer Nielbo
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

In this paper, we explore the extent to which readability contributes to the perception of literary quality as defined by two categories of variables: expert-based (e.g., Pulitzer Prize, National Book Award) and crowd-based (e.g., GoodReads, WorldCat). Based on a large corpus of modern and contemporary fiction in English, we examine the correlation of a text’s readability with its perceived literary quality, also assessing readability measures against simpler stylometric features. Our results show that readability generally correlates with popularity as measured through open platforms such as GoodReads and WorldCat but has an inverse relation with three prestigious literary awards. This points to a distinction between crowd- and expert-based judgments of literary style, as well as to a discrimination between fame and appreciation in the reception of a book.