SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents

Nishant Yadav, Matteo Brucato, Anna Fariha, Oscar Youngquist, Julian Killingback, Alexandra Meliou, Peter Haas


Abstract
Many applications require generation of summaries tailored to the user’s information needs, i.e., their intent. Methods that express intent via explicit user queries fall short when query interpretation is subjective. Several datasets exist for summarization with objective intents where, for each document and intent (e.g., “weather”), a single summary suffices for all users. No datasets exist, however, for subjective intents (e.g., “interesting places”) where different users will provide different summaries. We present SUBSUME, the first dataset for evaluation of SUBjective SUMmary Extraction systems. SUBSUME contains 2,200 (document, intent, summary) triplets over 48 Wikipedia pages, with ten intents of varying subjectivity, provided by 103 individuals over Mechanical Turk. We demonstrate statistically that the intents in SUBSUME vary systematically in subjectivity. To indicate SUBSUME’s usefulness, we explore a collection of baseline algorithms for subjective extractive summarization and show that (i) as expected, example-based approaches better capture subjective intents than query-based ones, and (ii) there is ample scope for improving upon the baseline algorithms, thereby motivating further research on this challenging problem.
Anthology ID:
2021.newsum-1.14
Volume:
Proceedings of the Third Workshop on New Frontiers in Summarization
Month:
November
Year:
2021
Address:
Online and in Dominican Republic
Editors:
Giuseppe Carenini, Jackie Chi Kit Cheung, Yue Dong, Fei Liu, Lu Wang
Venue:
NewSum
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
131–141
Language:
URL:
https://aclanthology.org/2021.newsum-1.14
DOI:
10.18653/v1/2021.newsum-1.14
Bibkey:
Cite (ACL):
Nishant Yadav, Matteo Brucato, Anna Fariha, Oscar Youngquist, Julian Killingback, Alexandra Meliou, and Peter Haas. 2021. SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents. In Proceedings of the Third Workshop on New Frontiers in Summarization, pages 131–141, Online and in Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents (Yadav et al., NewSum 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.newsum-1.14.pdf
Video:
 https://aclanthology.org/2021.newsum-1.14.mp4
Data
SubSumECNN/Daily Mail