Select and Summarize: Scene Saliency for Movie Script Summarization

Rohit Saxena, Frank Keller


Abstract
Abstractive summarization for long-form narrative texts such as movie scripts is challenging due to the computational and memory constraints of current language models. A movie script typically comprises a large number of scenes; however, only a fraction of these scenes are salient, i.e., important for understanding the overall narrative. The salience of a scene can be operationalized by considering it as salient if it is mentioned in the summary. Automatically identifying salient scenes is difficult due to the lack of suitable datasets. In this work, we introduce a scene saliency dataset that consists of human-annotated salient scenes for 100 movies. We propose a two-stage abstractive summarization approach which first identifies the salient scenes in script and then generates a summary using only those scenes. Using QA-based evaluation, we show that our model outperforms previous state-of-the-art summarization methods and reflects the information content of a movie more accurately than a model that takes the whole movie script as input.
Anthology ID:
2024.findings-naacl.218
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3439–3455
Language:
URL:
https://aclanthology.org/2024.findings-naacl.218
DOI:
10.18653/v1/2024.findings-naacl.218
Bibkey:
Cite (ACL):
Rohit Saxena and Frank Keller. 2024. Select and Summarize: Scene Saliency for Movie Script Summarization. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3439–3455, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Select and Summarize: Scene Saliency for Movie Script Summarization (Saxena & Keller, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.218.pdf