Erika Kuijpers


2024

pdf bib
The Kronieken Corpus: an Annotated Collection of Dutch/Flemish Chronicles from 1500-1850
Theo Dekker | Erika Kuijpers | Alie Lassche | Carolina Lenarduzzi | Roser Morante | Judith Pollmann
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

In this paper we present the Kronieken Corpus, a new digital collection of 204 chronicles written in Dutch/Flemish between 1500 and 1850, which have been scanned, transcribed and annotated with named entities, dates, pages and a smaller part with sources and attributions. The texts belong to 308 physical volumes and contain between 23 and 24 million words. 107 chronicles, or 178 chronicle volumes, collected from 39 different archives and libraries in The Netherlands and Belgium and transcribed by volunteers had never been transcribed or published before. The result is a unique enriched historical text corpus of original hand-written, non-canonical and non-fiction text by lay people from the early modern period.

2022

pdf bib
Identifying Copied Fragments in a 18th Century Dutch Chronicle
Roser Morante | Eleanor L. T. Smith | Lianne Wilhelmus | Alie Lassche | Erika Kuijpers
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We apply computational stylometric techniques to an 18th century Dutch chronicle to determine which fragments of the manuscript represent the author’s own original work and which show signs of external source use through either direct copying or paraphrasing. Through stylometric methods the majority of text fragments in the chronicle can be correctly labelled as either the author’s own words, direct copies from sources or paraphrasing. Our results show that clustering text fragments based on stylometric measures is an effective methodology for authorship verification of this document; however, this approach is less effective when personal writing style is masked by author independent styles or when applied to paraphrased text.