Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement
Yen-Hao Huang | Hsiao-Yen Lan | Yi-Shin Chen
Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Unsupervised extractive summarization has recently gained importance since it does not require labeled data. Among unsupervised methods, graph-based approaches have achieved outstanding results. These methods represent each document by a graph, with sentences as nodes and word-level similarity among sentences as edges. Common words can easily lead to a strong connection between sentence nodes. Thus, sentences with many common words can be misinterpreted as salient sentences for a summary. This work addresses the common word issue with a phrase-level graph that (1) focuses on the noun phrases of a document based on grammar dependencies and (2) initializes edge weights by term-frequency within the target document and inverse document frequency over the entire corpus. The importance scores of noun phrases extracted from the graph are then used to select the most salient sentences. To preserve summary coherence, the order of the selected sentences is re-arranged by a flow-aware orderBERT. The results reveal that our unsupervised framework outperformed other extractive methods on ROUGE as well as two human evaluations for semantic similarity and summary coherence.