Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm

Alicia Tsai, Laurent El Ghaoui


Abstract
We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with k sentences, the algorithm only needs to execute approximately k iterations, making it very efficient for a long document. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.
Anthology ID:
2020.sustainlp-1.8
Volume:
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Nafise Sadat Moosavi, Angela Fan, Vered Shwartz, Goran Glavaš, Shafiq Joty, Alex Wang, Thomas Wolf
Venue:
sustainlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
54–62
Language:
URL:
https://aclanthology.org/2020.sustainlp-1.8
DOI:
10.18653/v1/2020.sustainlp-1.8
Bibkey:
Cite (ACL):
Alicia Tsai and Laurent El Ghaoui. 2020. Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 54–62, Online. Association for Computational Linguistics.
Cite (Informal):
Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm (Tsai & El Ghaoui, sustainlp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sustainlp-1.8.pdf
Optional supplementary material:
 2020.sustainlp-1.8.OptionalSupplementaryMaterial.pdf
Video:
 https://slideslive.com/38939430