Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement

Yen-Hao Huang, Hsiao-Yen Lan, Yi-Shin Chen


Abstract
Unsupervised extractive summarization has recently gained importance since it does not require labeled data. Among unsupervised methods, graph-based approaches have achieved outstanding results. These methods represent each document by a graph, with sentences as nodes and word-level similarity among sentences as edges. Common words can easily lead to a strong connection between sentence nodes. Thus, sentences with many common words can be misinterpreted as salient sentences for a summary. This work addresses the common word issue with a phrase-level graph that (1) focuses on the noun phrases of a document based on grammar dependencies and (2) initializes edge weights by term-frequency within the target document and inverse document frequency over the entire corpus. The importance scores of noun phrases extracted from the graph are then used to select the most salient sentences. To preserve summary coherence, the order of the selected sentences is re-arranged by a flow-aware orderBERT. The results reveal that our unsupervised framework outperformed other extractive methods on ROUGE as well as two human evaluations for semantic similarity and summary coherence.
Anthology ID:
2022.rocling-1.3
Volume:
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Month:
November
Year:
2022
Address:
Taipei, Taiwan
Editors:
Yung-Chun Chang, Yi-Chin Huang
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
15–24
Language:
URL:
https://aclanthology.org/2022.rocling-1.3
DOI:
Bibkey:
Cite (ACL):
Yen-Hao Huang, Hsiao-Yen Lan, and Yi-Shin Chen. 2022. Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), pages 15–24, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement (Huang et al., ROCLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.rocling-1.3.pdf