Alie Lassche

2025

Fact from Fiction: Finding Serialized Novels in Newspapers
Pascale Feldkamp | Alie Lassche | Katrine Frøkjær Baunvig | Kristoffer Nielbo | Yuri Bizzoni
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Digitized literary corpora of the 19th century favor canonical and novelistic forms, sidelining a broader and more diverse literary production. Serialized fiction – widely read but embedded in newspapers – remains especially underexplored, particularly in low-resource languages like Danish. This paper addresses this gap by developing methods to identify fiction in digitized Danish newspapers (1818–1848).We (1) introduce a manually annotated dataset of 1,394 articles and (2) evaluate classification pipelines using both selected linguistic features and embeddings, achieving F1-scores of up to 0.91. Finally, we (3) analyze feuilleton fiction via interpretable features to test its drift in discourse from neighboring nonfiction.Our results support the construction of alternative literary corpora and contribute to ongoing work on modeling the fiction–nonfiction boundary by operationalizing discourse-level distinctions at scale.

pdf bib abs

Why Novels (Don’t) Break Through: Dynamics of Canonicity in the Danish Modern Breakthrough (1870-1900)
Alie Lassche | Pascale Feldkamp | Yuri Bizzoni | Katrine Baunvig | Kristoffer Nielbo
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)

Recent studies suggest that canonical works possess unique textual profiles, often tied to innovation and higher cognitive demands. However, recent work on Danish 19th century literary novels has shown that some non-canonical works shared similar textual qualities with canonical works, underscoring the role of text-extrinsic factors in shaping canonicity. The present study examines the same corpus (more than 800 Danish novels from the Modern Breakthrough era (1870–1900)) to explore socio-economic and institutional factors, as well as demographic features, specifically, book prices, publishers, and the author’s nationality – in determining canonical status. We combine expert-based and national definitions of canon to set up a classification experiment to test the predictive power of these external features, and to understand how they relate to that of text-intrinsic features. We show that the canonization process is influenced by external factors – such as publisher and nationality – but that text-intrinsic features nevertheless maintain predictive power in a dynamic interplay of text and context.

2024

pdf bib abs

The Kronieken Corpus: an Annotated Collection of Dutch/Flemish Chronicles from 1500-1850
Theo Dekker | Erika Kuijpers | Alie Lassche | Carolina Lenarduzzi | Roser Morante | Judith Pollmann
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

In this paper we present the Kronieken Corpus, a new digital collection of 204 chronicles written in Dutch/Flemish between 1500 and 1850, which have been scanned, transcribed and annotated with named entities, dates, pages and a smaller part with sources and attributions. The texts belong to 308 physical volumes and contain between 23 and 24 million words. 107 chronicles, or 178 chronicle volumes, collected from 39 different archives and libraries in The Netherlands and Belgium and transcribed by volunteers had never been transcribed or published before. The result is a unique enriched historical text corpus of original hand-written, non-canonical and non-fiction text by lay people from the early modern period.

pdf bib abs

Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870–1900)
Pascale Feldkamp | Alie Lassche | Jan Kostkan | Márton Kardos | Kenneth Enevoldsen | Katrine Baunvig | Kristoffer Nielbo
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

We examine the relationship between the canonization of Danish novels and their textual innovation and influence, taking the Danish Modern Breakthrough era (1870–1900) as a case study. We evaluate whether canonical novels introduced a significant textual novelty in their time, and explore their influence on the overall literary trend of the period. By analyzing the positions of canonical versus non-canonical novels in semantic space, we seek to better understand the link between a novel’s canonical status and its literary impact. Additionally, we examine the overall diversification of Modern Breakthrough novels during this significant period of rising literary readership. We find that canonical novels stand out from both the historical novel genre and non-canonical novels of the period. Our findings on diversification within and across groups indicate that the novels now regarded as canonical served as literary trendsetters of their time.

2022

pdf bib abs

Identifying Copied Fragments in a 18th Century Dutch Chronicle
Roser Morante | Eleanor L. T. Smith | Lianne Wilhelmus | Alie Lassche | Erika Kuijpers
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We apply computational stylometric techniques to an 18th century Dutch chronicle to determine which fragments of the manuscript represent the author’s own original work and which show signs of external source use through either direct copying or paraphrasing. Through stylometric methods the majority of text fragments in the chronicle can be correctly labelled as either the author’s own words, direct copies from sources or paraphrasing. Our results show that clustering text fragments based on stylometric measures is an effective methodology for authorship verification of this document; however, this approach is less effective when personal writing style is masked by author independent styles or when applied to paraphrased text.

2021

pdf bib abs

The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF
Alie Lassche | Roser Morante
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.