Unsupervised Single Document Abstractive Summarization using Semantic Units

Jhen-Yi Wu, Ying-Jia Lin, Hung-Yu Kao


Abstract
In this work, we study the importance of content frequency on abstractive summarization, where we define the content as “semantic units.” We propose a two-stage training framework to let the model automatically learn the frequency of each semantic unit in the source text. Our model is trained in an unsupervised manner since the frequency information can be inferred from source text only. During inference, our model identifies sentences with high-frequency semantic units and utilizes frequency information to generate summaries from the filtered sentences. Our model performance on the CNN/Daily Mail summarization task outperforms the other unsupervised methods under the same settings. Furthermore, we achieve competitive ROUGE scores with far fewer model parameters compared to several large-scale pre-trained models. Our model can be trained under low-resource language settings and thus can serve as a potential solution for real-world applications where pre-trained models are not applicable.
Anthology ID:
2022.aacl-main.69
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2022
Address:
Online only
Editors:
Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
954–966
Language:
URL:
https://aclanthology.org/2022.aacl-main.69
DOI:
10.18653/v1/2022.aacl-main.69
Bibkey:
Cite (ACL):
Jhen-Yi Wu, Ying-Jia Lin, and Hung-Yu Kao. 2022. Unsupervised Single Document Abstractive Summarization using Semantic Units. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 954–966, Online only. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Single Document Abstractive Summarization using Semantic Units (Wu et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-main.69.pdf
Dataset:
 2022.aacl-main.69.Dataset.txt