SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents

Nishant Yadav, Matteo Brucato, Anna Fariha, Oscar Youngquist, Julian Killingback, Alexandra Meliou, Peter Haas


Abstract
Many applications require generation of summaries tailored to the user’s information needs, i.e., their intent. Methods that express intent via explicit user queries fall short when query interpretation is subjective. Several datasets exist for summarization with objective intents where, for each document and intent (e.g., “weather”), a single summary suffices for all users. No datasets exist, however, for subjective intents (e.g., “interesting places”) where different users will provide different summaries. We present SUBSUME, the first dataset for evaluation of SUBjective SUMmary Extraction systems. SUBSUME contains 2,200 (document, intent, summary) triplets over 48 Wikipedia pages, with ten intents of varying subjectivity, provided by 103 individuals over Mechanical Turk. We demonstrate statistically that the intents in SUBSUME vary systematically in subjectivity. To indicate SUBSUME’s usefulness, we explore a collection of baseline algorithms for subjective extractive summarization and show that (i) as expected, example-based approaches better capture subjective intents than query-based ones, and (ii) there is ample scope for improving upon the baseline algorithms, thereby motivating further research on this challenging problem.
Anthology ID:
2021.newsum-1.14
Volume:
Proceedings of the Third Workshop on New Frontiers in Summarization
Month:
November
Year:
2021
Address:
Online and in Dominican Republic
Venues:
EMNLP | newsum
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
131–141
Language:
URL:
https://aclanthology.org/2021.newsum-1.14
DOI:
10.18653/v1/2021.newsum-1.14
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.newsum-1.14.pdf
Data
SubSumECNN/Daily Mail