Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, Katja Markert


Abstract
Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary by discrete optimization. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores. Additionally, we demonstrate that the commonly reported ROUGE F1 metric is sensitive to summary length. Since this is unwillingly exploited in recent work, we emphasize that future evaluation should explicitly group summarization systems by output length brackets.
Anthology ID:
2020.acl-main.452
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5032–5042
Language:
URL:
https://aclanthology.org/2020.acl-main.452
DOI:
10.18653/v1/2020.acl-main.452
Bibkey:
Cite (ACL):
Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, and Katja Markert. 2020. Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5032–5042, Online. Association for Computational Linguistics.
Cite (Informal):
Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction (Schumann et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.452.pdf
Software:
 2020.acl-main.452.Software.zip
Video:
 http://slideslive.com/38929184
Code
 raphael-sch/HC_Sentence_Summarization +  additional community code
Data
SNLI