Unsupervised Multi-document Summarization for News Corpus with Key Synonyms and Contextual Embeddings

Yen-Hao Huang, Ratana Pornvattanavichai, Fernando Henrique Calderon Alvarado, Yi-Shin Chen


Abstract
Information overload has been one of the challenges regarding information from the Internet. It is not a matter of information access, instead, the focus had shifted towards the quality of the retrieved data. Particularly in the news domain, multiple outlets report on the same news events but may differ in details. This work considers that different news outlets are more likely to differ in their writing styles and the choice of words, and proposes a method to extract sentences based on their key information by focusing on the shared synonyms in each sentence. Our method also attempts to reduce redundancy through hierarchical clustering and arrange selected sentences on the proposed orderBERT. The results show that the proposed unsupervised framework successfully improves the coverage, coherence, and, meanwhile, reduces the redundancy for a generated summary. Moreover, due to the process of obtaining the dataset, we also propose a data refinement method to alleviate the problems of undesirable texts, which result from the process of automatic scraping.
Anthology ID:
2021.rocling-1.25
Volume:
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)
Month:
October
Year:
2021
Address:
Taoyuan, Taiwan
Editors:
Lung-Hao Lee, Chia-Hui Chang, Kuan-Yu Chen
Venue:
ROCLING
SIG:
Publisher:
The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Note:
Pages:
192–201
Language:
URL:
https://aclanthology.org/2021.rocling-1.25
DOI:
Bibkey:
Cite (ACL):
Yen-Hao Huang, Ratana Pornvattanavichai, Fernando Henrique Calderon Alvarado, and Yi-Shin Chen. 2021. Unsupervised Multi-document Summarization for News Corpus with Key Synonyms and Contextual Embeddings. In Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021), pages 192–201, Taoyuan, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
Cite (Informal):
Unsupervised Multi-document Summarization for News Corpus with Key Synonyms and Contextual Embeddings (Huang et al., ROCLING 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.rocling-1.25.pdf