Alie Lassche


2024

pdf bib
The Kronieken Corpus: an Annotated Collection of Dutch/Flemish Chronicles from 1500-1850
Theo Dekker | Erika Kuijpers | Alie Lassche | Carolina Lenarduzzi | Roser Morante | Judith Pollmann
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

In this paper we present the Kronieken Corpus, a new digital collection of 204 chronicles written in Dutch/Flemish between 1500 and 1850, which have been scanned, transcribed and annotated with named entities, dates, pages and a smaller part with sources and attributions. The texts belong to 308 physical volumes and contain between 23 and 24 million words. 107 chronicles, or 178 chronicle volumes, collected from 39 different archives and libraries in The Netherlands and Belgium and transcribed by volunteers had never been transcribed or published before. The result is a unique enriched historical text corpus of original hand-written, non-canonical and non-fiction text by lay people from the early modern period.

2022

pdf bib
Identifying Copied Fragments in a 18th Century Dutch Chronicle
Roser Morante | Eleanor L. T. Smith | Lianne Wilhelmus | Alie Lassche | Erika Kuijpers
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We apply computational stylometric techniques to an 18th century Dutch chronicle to determine which fragments of the manuscript represent the author’s own original work and which show signs of external source use through either direct copying or paraphrasing. Through stylometric methods the majority of text fragments in the chronicle can be correctly labelled as either the author’s own words, direct copies from sources or paraphrasing. Our results show that clustering text fragments based on stylometric measures is an effective methodology for authorship verification of this document; however, this approach is less effective when personal writing style is masked by author independent styles or when applied to paraphrased text.

2021

pdf bib
The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF
Alie Lassche | Roser Morante
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents the annotations and experiments aimed at exploring whether we can automatically extract media related information (source, perception, and receiver) from a corpus of early modern Dutch chronicles in order to get insight in the mediascape of early modern middle class people from a historic perspective. In a number of classification experiments with Conditional Random Fields, three categories of features are tested: (i) raw and binary word embedding features, (ii) lexicon features, and (iii) character features. Overall, the classifier that uses raw embeddings performs slightly better. However, given that the best F-scores are around 0.60, we conclude that the machine learning approach needs to be combined with a close reading approach for the results to be useful to answer history research questions.