At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization

Qingyu Zhou, Furu Wei, Ming Zhou


Abstract
Extractive methods have been proven effective in automatic document summarization. Previous works perform this task by identifying informative contents at sentence level. However, it is unclear whether performing extraction at sentence level is the best solution. In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative. Specifically, we propose extracting sub-sentential units based on the constituency parsing tree. A neural extractive model which leverages the sub-sentential information and extracts them is presented. Extensive experiments and analyses show that extracting sub-sentential units performs competitively comparing to full sentence extraction under the evaluation of both automatic and human evaluations. Hopefully, our work could provide some inspiration of the basic extraction units in extractive summarization for future research.
Anthology ID:
2020.coling-main.492
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5617–5628
Language:
URL:
https://aclanthology.org/2020.coling-main.492
DOI:
10.18653/v1/2020.coling-main.492
Bibkey:
Cite (ACL):
Qingyu Zhou, Furu Wei, and Ming Zhou. 2020. At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5617–5628, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization (Zhou et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.492.pdf