Janne Lorenzen


2021

pdf bib
The FairyNet Corpus - Character Networks for German Fairy Tales
David Schmidt | Albin Zehe | Janne Lorenzen | Lisa Sergel | Sebastian Düker | Markus Krug | Frank Puppe
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents a data set of German fairy tales, manually annotated with character networks which were obtained with high inter rater agreement. The release of this corpus provides an opportunity of training and comparing different algorithms for the extraction of character networks, which so far was barely possible due to heterogeneous interests of previous researchers. We demonstrate the usefulness of our data set by providing baseline experiments for the automatic extraction of character networks, applying a rule-based pipeline as well as a neural approach, and find the neural approach outperforming the rule-approach in most evaluation settings.