Content Selection in Deep Learning Models of Summarization

Chris Kedzie, Kathleen McKeown, Hal Daumé III


Abstract
We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed. We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. These results suggest that it is easier to create a summarizer for a new domain than previous work suggests and bring into question the benefit of deep learning models for summarization for those domains that do have massive datasets (i.e., news). At the same time, they suggest important questions for new research in summarization; namely, new forms of sentence representations or external knowledge sources are needed that are better suited to the sumarization task.
Anthology ID:
D18-1208
Original:
D18-1208v1
Version 2:
D18-1208v2
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1818–1828
Language:
URL:
https://aclanthology.org/D18-1208
DOI:
10.18653/v1/D18-1208
Bibkey:
Cite (ACL):
Chris Kedzie, Kathleen McKeown, and Hal Daumé III. 2018. Content Selection in Deep Learning Models of Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1818–1828, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Content Selection in Deep Learning Models of Summarization (Kedzie et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1208.pdf
Attachment:
 D18-1208.Attachment.pdf
Video:
 https://aclanthology.org/D18-1208.mp4
Code
 kedz/nnsum +  additional community code
Data
New York Times Annotated Corpus