IDN-Sum: A New Dataset for Interactive Digital Narrative Extractive Text Summarisation

Ashwathy T. Revi; Stuart E. Middleton; David E. Millard

IDN-Sum: A New Dataset for Interactive Digital Narrative Extractive Text Summarisation

Ashwathy T. Revi, Stuart E. Middleton, David E. Millard

Abstract

Summarizing Interactive Digital Narratives (IDN) presents some unique challenges to existing text summarization models especially around capturing interactive elements in addition to important plot points. In this paper, we describe the first IDN dataset (IDN-Sum) designed specifically for training and testing IDN text summarization algorithms. Our dataset is generated using random playthroughs of 8 IDN episodes, taken from 2 different IDN games, and consists of 10,000 documents. Playthrough documents are annotated through automatic alignment with fan-sourced summaries using a commonly used alignment algorithm. We also report and discuss results from experiments applying common baseline extractive text summarization algorithms to this dataset. Qualitative analysis of the results reveals shortcomings in common annotation approaches and evaluation methods when applied to narrative and interactive narrative datasets. The dataset is released as open source for future researchers to train and test their own approaches for IDN text.

Anthology ID:: 2022.creativesumm-1.1
Volume:: Proceedings of the Workshop on Automatic Summarization for Creative Writing
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editor:: Kathleen Mckeown
Venue:: CreativeSumm
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1–12
Language:
URL:: https://aclanthology.org/2022.creativesumm-1.1/
DOI:
Bibkey:
Cite (ACL):: Ashwathy T. Revi, Stuart E. Middleton, and David E. Millard. 2022. IDN-Sum: A New Dataset for Interactive Digital Narrative Extractive Text Summarisation. In Proceedings of the Workshop on Automatic Summarization for Creative Writing, pages 1–12, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):: IDN-Sum: A New Dataset for Interactive Digital Narrative Extractive Text Summarisation (Revi et al., CreativeSumm 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.creativesumm-1.1.pdf

PDF Cite Search Fix data