Inducing Document Structure for Aspect-based Summarization

Lea Frermann, Alexandre Klementiev


Abstract
Automatic summarization is typically treated as a 1-to-1 mapping from document to summary. Documents such as news articles, however, are structured and often cover multiple topics or aspects; and readers may be interested in only some of them. We tackle the task of aspect-based summarization, where, given a document and a target aspect, our models generate a summary centered around the aspect. We induce latent document structure jointly with an abstractive summarization objective, and train our models in a scalable synthetic setup. In addition to improvements in summarization over topic-agnostic baselines, we demonstrate the benefit of the learnt document structure: we show that our models (a) learn to accurately segment documents by aspect; (b) can leverage the structure to produce both abstractive and extractive aspect-based summaries; and (c) that structure is particularly advantageous for summarizing long documents. All results transfer from synthetic training documents to natural news articles from CNN/Daily Mail and RCV1.
Anthology ID:
P19-1630
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6263–6273
Language:
URL:
https://aclanthology.org/P19-1630
DOI:
10.18653/v1/P19-1630
Bibkey:
Cite (ACL):
Lea Frermann and Alexandre Klementiev. 2019. Inducing Document Structure for Aspect-based Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6263–6273, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Inducing Document Structure for Aspect-based Summarization (Frermann & Klementiev, ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1630.pdf
Code
 ColiLea/aspect_based_summarization
Data
CNN/Daily MailRCV1