AIStorySimilarity: Quantifying Story Similarity Using Narrative for Search, IP Infringement, and Guided Creativity

Jon Chun


Abstract
Stories are central for interpreting experiences, communicating, and influencing each other via films, medical, media, and other narratives. Quantifying the similarity between stories has numerous applications including detecting IP infringement, detecting hallucinations, search/recommendation engines, and guiding human-AI collaborations. Despite this, traditional NLP text similarity metrics are limited to short text distance metrics like n-gram overlaps and embeddings. Larger texts require preprocessing with significant information loss through paraphrasing or multi-step decomposition. This paper introduces AIStorySimilarity, a novel benchmark to measure the semantic distance between long-text stories based on core structural elements drawn from narrative theory and script writing. Based on four narrative elements (characters, plot, setting, and themes) as well 31 sub-features within these, we use a SOTA LLM (gpt-3.5-turbo) to extract and evaluate the semantic similarity of a diverse set of major Hollywood movies. In addition, we compare human evaluation with story similarity scores computed three ways: extracting elements from film scripts before evaluation (Elements), directly evaluating entire scripts (Scripts), and extracting narrative elements from the parametric memory of SOTA LLMs without any provided scripts (GenAI). To the best of our knowledge, AIStorySimilarity is the first benchmark to measure long-text story similarity using a comprehensive approach to narrative theory. Code and data are available at https://github.com/jon-chun/AIStorySimiliarity.
Anthology ID:
2024.conll-1.13
Volume:
Proceedings of the 28th Conference on Computational Natural Language Learning
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Libby Barak, Malihe Alikhani
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
161–177
Language:
URL:
https://aclanthology.org/2024.conll-1.13
DOI:
Bibkey:
Cite (ACL):
Jon Chun. 2024. AIStorySimilarity: Quantifying Story Similarity Using Narrative for Search, IP Infringement, and Guided Creativity. In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 161–177, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
AIStorySimilarity: Quantifying Story Similarity Using Narrative for Search, IP Infringement, and Guided Creativity (Chun, CoNLL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.conll-1.13.pdf