Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories

David Wilmot, Frank Keller


Abstract
Measuring event salience is essential in the understanding of stories. This paper takes a recent unsupervised method for salience detection derived from Barthes Cardinal Functions and theories of surprise and applies it to longer narrative forms. We improve the standard transformer language model by incorporating an external knowledgebase (derived from Retrieval Augmented Generation) and adding a memory mechanism to enhance performance on longer works. We use a novel approach to derive salience annotation using chapter-aligned summaries from the Shmoop corpus for classic literary works. Our evaluation against this data demonstrates that our salience detection model improves performance over and above a non-knowledgebase and memory augmented language model, both of which are crucial to this improvement.
Anthology ID:
2021.emnlp-main.65
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
851–865
Language:
URL:
https://aclanthology.org/2021.emnlp-main.65
DOI:
10.18653/v1/2021.emnlp-main.65
Bibkey:
Cite (ACL):
David Wilmot and Frank Keller. 2021. Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 851–865, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories (Wilmot & Keller, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.65.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.65.mp4
Code
 dwlmt/story-fragments
Data
Shmoop Corpus