Subtopic-driven Multi-Document Summarization

Xin Zheng, Aixin Sun, Jing Li, Karthik Muthuswamy


Abstract
In multi-document summarization, a set of documents to be summarized is assumed to be on the same topic, known as the underlying topic in this paper. That is, the underlying topic can be collectively represented by all the documents in the set. Meanwhile, different documents may cover various different subtopics and the same subtopic can be across several documents. Inspired by topic model, the underlying topic of a document set can also be viewed as a collection of different subtopics of different importance. In this paper, we propose a summarization model called STDS. The model generates the underlying topic representation from both document view and subtopic view in parallel. The learning objective is to minimize the distance between the representations learned from the two views. The contextual information is encoded through a hierarchical RNN architecture. Sentence salience is estimated in a hierarchical way with subtopic salience and relative sentence salience, by considering the contextual information. Top ranked sentences are then extracted as a summary. Note that the notion of subtopic enables us to bring in additional information (e.g. comments to news articles) that is helpful for document summarization. Experimental results show that the proposed solution outperforms state-of-the-art methods on benchmark datasets.
Anthology ID:
D19-1311
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
3153–3162
Language:
URL:
https://aclanthology.org/D19-1311/
DOI:
10.18653/v1/D19-1311
Bibkey:
Cite (ACL):
Xin Zheng, Aixin Sun, Jing Li, and Karthik Muthuswamy. 2019. Subtopic-driven Multi-Document Summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3153–3162, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Subtopic-driven Multi-Document Summarization (Zheng et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1311.pdf