Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline

Jiaxin Ju, Ming Liu, Longxiang Gao, Shirui Pan


Abstract
The Scholarly Document Processing (SDP) workshop is to encourage more efforts on natural language understanding of scientific task. It contains three shared tasks and we participate in the LongSumm shared task. In this paper, we describe our text summarization system, SciSummPip, inspired by SummPip (Zhao et al., 2020) that is an unsupervised text summarization system for multi-document in News domain. Our SciSummPip includes a transformer-based language model SciBERT (Beltagy et al., 2019) for contextual sentence representation, content selection with PageRank (Page et al., 1999), sentence graph construction with both deep and linguistic information, sentence graph clustering and within-graph summary generation. Our work differs from previous method in that content selection and a summary length constraint is applied to adapt to the scientific domain. The experiment results on both training dataset and blind test dataset show the effectiveness of our method, and we empirically verify the robustness of modules used in SciSummPip with BERTScore (Zhang et al., 2019a).
Anthology ID:
2020.sdp-1.37
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
318–327
Language:
URL:
https://aclanthology.org/2020.sdp-1.37
DOI:
10.18653/v1/2020.sdp-1.37
Bibkey:
Cite (ACL):
Jiaxin Ju, Ming Liu, Longxiang Gao, and Shirui Pan. 2020. Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline. In Proceedings of the First Workshop on Scholarly Document Processing, pages 318–327, Online. Association for Computational Linguistics.
Cite (Informal):
Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline (Ju et al., sdp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sdp-1.37.pdf
Data
TalkSumm