David E. Millard


2024

pdf bib
Rationale-based Learning Using Self-Supervised Narrative Events for Text Summarisation of Interactive Digital Narratives
Ashwathy T Revi | Stuart E. Middleton | David E. Millard
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper explores using rationale-based learning with supervised attention to focus the training of text summarisation models on words and sentences surrounding choice points for Interactive Digital Narratives (IDNs). IDNs allow players to interact with the story via choice points, making choices central to these narratives. Exploiting such knowledge about narrative structure during model training can help ensure key narrative information appears in generated summaries of narrative-based text and thus improve the quality of these summaries. We experiment with using word-level and sentence-level rationales indicating the proximity of words and sentences to self-supervised choice points. Our results indicate that rationale-based learning can improve the ability of attention-based text summarisation models to create higher quality summaries that encode key narrative information better for different playthroughs of the same interactive narrative. These results suggest a promising new direction for narrative-based text summarisation models.

2022

pdf bib
IDN-Sum: A New Dataset for Interactive Digital Narrative Extractive Text Summarisation
Ashwathy T. Revi | Stuart E. Middleton | David E. Millard
Proceedings of The Workshop on Automatic Summarization for Creative Writing

Summarizing Interactive Digital Narratives (IDN) presents some unique challenges to existing text summarization models especially around capturing interactive elements in addition to important plot points. In this paper, we describe the first IDN dataset (IDN-Sum) designed specifically for training and testing IDN text summarization algorithms. Our dataset is generated using random playthroughs of 8 IDN episodes, taken from 2 different IDN games, and consists of 10,000 documents. Playthrough documents are annotated through automatic alignment with fan-sourced summaries using a commonly used alignment algorithm. We also report and discuss results from experiments applying common baseline extractive text summarization algorithms to this dataset. Qualitative analysis of the results reveals shortcomings in common annotation approaches and evaluation methods when applied to narrative and interactive narrative datasets. The dataset is released as open source for future researchers to train and test their own approaches for IDN text.