Recall Them All: Long List Generation from Long Novels

Sneha Singhania, Simon Razniewski, Gerhard Weikum


Abstract
Language models can generate lists of salient literary characters for specific relations but struggle with long, complete lists spanning entire novels. This paper studies the non-standard setting of extracting complete entity lists from full-length books, such as identifying all 50+ friends of Harry Potter across the 7-volume book series. We construct a benchmark dataset with meticulously compiled ground-truth, posing it as a challenge for the research community. We present a first-cut method to tackle this task, based on RAG with LLMs. Our method introduces the novel contribution of harnessing IR-style pseudo-relevance feedback for effective passage retrieval from literary texts. Experimental results show that our approach clearly outperforms both LLM-only and standard RAG baselines, achieving higher recall while maintaining acceptable precision.
Anthology ID:
2025.lm4dh-1.13
Volume:
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Isuri Nanomi Arachchige, Francesca Frontini, Ruslan Mitkov, Paul Rayson
Venues:
LM4DH | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
133–142
Language:
URL:
https://aclanthology.org/2025.lm4dh-1.13/
DOI:
Bibkey:
Cite (ACL):
Sneha Singhania, Simon Razniewski, and Gerhard Weikum. 2025. Recall Them All: Long List Generation from Long Novels. In Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities, pages 133–142, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Recall Them All: Long List Generation from Long Novels (Singhania et al., LM4DH 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.lm4dh-1.13.pdf