2024
pdf
bib
abs
Story Embeddings — Narrative-Focused Representations of Fictional Stories
Hans Ole Hatzel
|
Chris Biemann
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We present a novel approach to modeling fictional narratives. The proposed model creates embeddings that represent a story such that similar narratives, that is, reformulations of the same story, will result in similar embeddings. We showcase the prowess of our narrative-focused embeddings on various datasets, exhibiting state-of-the-art performance on multiple retrieval tasks. The embeddings also show promising results on a narrative understanding task. Additionally, we perform an annotation-based evaluation to validate that our introduced computational notion of narrative similarity aligns with human perception. The approach can help to explore vast datasets of stories, with potential applications in recommender systems and in the computational analysis of literature.
pdf
bib
abs
Coreference in Long Documents using Hierarchical Entity Merging
Talika Gupta
|
Hans Ole Hatzel
|
Chris Biemann
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
Current top-performing coreference resolution approaches are limited with regard to the maximum length of texts they can accept. We explore a recursive merging technique of entities that allows us to apply coreference models to texts of arbitrary length, as found in many narrative genres. In experiments on established datasets, we quantify the drop in resolution quality caused by this approach. Finally, we use an under-explored resource in the form of a fully coreference-annotated novel to illustrate our model’s performance for long documents in practice. Here, we achieve state-of-the-art performance, outperforming previous systems capable of handling long documents.
pdf
bib
abs
Tell Me Again! a Large-Scale Dataset of Multiple Summaries for the Same Story
Hans Ole Hatzel
|
Chris Biemann
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
A wide body of research is concerned with the semantics of narratives, both in terms of understanding narratives and generating fictional narratives and stories. We provide a dataset of summaries to be used as a proxy for entire stories or for the analysis of the summaries themselves. Our dataset consists of a total of 96,831 individual summaries across 29,505 stories. We intend for the dataset to be used for training and evaluation of embedding representations for stories, specifically the stories’ narratives. The summary data is harvested from five different language versions of Wikipedia. Our dataset comes with rich metadata, which we extract from Wikidata, enabling a wide range of applications that operate on story summaries in conjunction with metadata. To set baseline results, we run retrieval experiments on the dataset, exploring the capability of similarity models in retrieving summaries of the same story. For this retrieval, a crucial element is to not place too much emphasis on the named entities, as this can enable retrieval of other summaries for the same work without taking the narrative into account.
2023
pdf
bib
abs
Narrative Cloze as a Training Objective: Towards Modeling Stories Using Narrative Chain Embeddings
Hans Ole Hatzel
|
Chris Biemann
Proceedings of the 5th Workshop on Narrative Understanding
We present a novel approach to modeling narratives using narrative chain embeddings.A new dataset of narrative chains extracted from German news texts is presented. With neural methods, we produce models for both German and English that achieve state-of-the-art performance on the Multiple Choice Narrative Cloze task. Subsequently, we perform an extrinsic evaluation of the embeddings our models produce and show that they perform rather poorly in identifying narratively similar texts. We explore some of the reasons for this underperformance and discuss the upsides of our approach. We provide an outlook on alternative ways to model narratives, as well as techniques for evaluating such models.
2021
pdf
bib
abs
Towards Layered Events and Schema Representations in Long Documents
Hans Ole Hatzel
|
Chris Biemann
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
In this thesis proposal, we explore the application of event extraction to literary texts. Considering the lengths of literary documents modeling events in different granularities may be more adequate to extract meaningful information, as individual elements contribute little to the overall semantics. We adapt the concept of schemas as sequences of events all describing a single process, connected through shared participants extending it to for multiple schemas in a document. Segmentation of event sequences into schemas is approached by modeling event sequences, on such task as the narrative cloze task, the prediction of missing events in sequences. We propose building on sequences of event embeddings to form schema embeddings, thereby summarizing sections of documents using a single representation. This approach will allow for the comparisons of different sections of documents and entire literary works. Literature is a challenging domain based on its variety of genres, yet the representation of literary content has received relatively little attention.
pdf
bib
Neural End-to-end Coreference Resolution for German in Different Domains
Fynn Schröder
|
Hans Ole Hatzel
|
Chris Biemann
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)