Identifying Copied Fragments in a 18th Century Dutch Chronicle

Roser Morante, Eleanor L. T. Smith, Lianne Wilhelmus, Alie Lassche, Erika Kuijpers


Abstract
We apply computational stylometric techniques to an 18th century Dutch chronicle to determine which fragments of the manuscript represent the author’s own original work and which show signs of external source use through either direct copying or paraphrasing. Through stylometric methods the majority of text fragments in the chronicle can be correctly labelled as either the author’s own words, direct copies from sources or paraphrasing. Our results show that clustering text fragments based on stylometric measures is an effective methodology for authorship verification of this document; however, this approach is less effective when personal writing style is masked by author independent styles or when applied to paraphrased text.
Anthology ID:
2022.lrec-1.631
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5865–5878
Language:
URL:
https://aclanthology.org/2022.lrec-1.631
DOI:
Bibkey:
Cite (ACL):
Roser Morante, Eleanor L. T. Smith, Lianne Wilhelmus, Alie Lassche, and Erika Kuijpers. 2022. Identifying Copied Fragments in a 18th Century Dutch Chronicle. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5865–5878, Marseille, France. European Language Resources Association.
Cite (Informal):
Identifying Copied Fragments in a 18th Century Dutch Chronicle (Morante et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.631.pdf
Code
 chroniclingnovelty/stylometry-lrec22