Leveraging Information Bottleneck for Scientific Document Summarization

Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, Shirui Pan


Abstract
This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.
Anthology ID:
2021.findings-emnlp.345
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4091–4098
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.345
DOI:
10.18653/v1/2021.findings-emnlp.345
Bibkey:
Cite (ACL):
Jiaxin Ju, Ming Liu, Huan Yee Koh, Yuan Jin, Lan Du, and Shirui Pan. 2021. Leveraging Information Bottleneck for Scientific Document Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4091–4098, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Leveraging Information Bottleneck for Scientific Document Summarization (Ju et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.345.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.345.mp4