Considering Nested Tree Structure in Sentence Extractive Summarization with Pre-trained Transformer

Jingun Kwon, Naoki Kobayashi, Hidetaka Kamigaito, Manabu Okumura


Abstract
Sentence extractive summarization shortens a document by selecting sentences for a summary while preserving its important contents. However, constructing a coherent and informative summary is difficult using a pre-trained BERT-based encoder since it is not explicitly trained for representing the information of sentences in a document. We propose a nested tree-based extractive summarization model on RoBERTa (NeRoBERTa), where nested tree structures consist of syntactic and discourse trees in a given document. Experimental results on the CNN/DailyMail dataset showed that NeRoBERTa outperforms baseline models in ROUGE. Human evaluation results also showed that NeRoBERTa achieves significantly better scores than the baselines in terms of coherence and yields comparable scores to the state-of-the-art models.
Anthology ID:
2021.emnlp-main.330
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4039–4044
Language:
URL:
https://aclanthology.org/2021.emnlp-main.330
DOI:
10.18653/v1/2021.emnlp-main.330
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.330.pdf
Data
CNN/Daily Mail