AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Melissa Roemmele, Kyle Shaffer, Katrina Olsen, Yiyi Wang, Steve DeNeefe


Abstract
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.
Anthology ID:
2023.eacl-main.269
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3717–3733
Language:
URL:
https://aclanthology.org/2023.eacl-main.269
DOI:
10.18653/v1/2023.eacl-main.269
Bibkey:
Cite (ACL):
Melissa Roemmele, Kyle Shaffer, Katrina Olsen, Yiyi Wang, and Steve DeNeefe. 2023. AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3717–3733, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature (Roemmele et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.269.pdf
Video:
 https://aclanthology.org/2023.eacl-main.269.mp4